7 Value of Privacy and Data Accuracy

One key challenge for implementing formal privacy systems lies in choosing the amount, or type, of privacy to provide. Answering this question requires some way to understand the individual and social value of privacy. Ghosh & Roth (2015) and Li, Li, Miklau, & Suciu (2014) both model mechanisms for pricing private data under the assumption that individuals are only willing to disclose such information if they are paid.

Part of the social value of privacy arises from its relationship to scientific integrity. While the law of information recovery suggests that improved privacy must come at the cost of increased error in published statistics, these effects might be mitigated through two distinct channels. First, people are more truthful in surveys if they believe their data is not at risk, as Couper, Singer, Conrad, & Groves (2008) illustrate. Second, work in computer science (Dwork et al., 2015 ) and statistics (Cummings, Ligett, Nissim, Roth, & Wu, 2016) suggests another somewhat surprising benefit of differential privacy: protection against overfitting. A complete accounting of the costs and benefits of formal privacy systems should take these channels into account.

It is equally necessary to develop a more robust understanding of why data is valuable in the first place, the overall social cost of increasing error in public statistics. This seems to be an area in which very little comprehensive theoretical or empirical research has been done. We nevertheless recommend what seem to be good starting points.

On the theoretical side, economists studying privacy have also developed models of the value of data to firms. In these models, firms benefit from being able to tailor prices based on individual demand (Taylor, 2004), or from being able to market more effectively (Varian, 1998). More recently, a theoretical literature on information design has begun to consider more effective ways to manage markets for consumer information, see Bergemann, Bonatti, & Smolin (2018) and Pomatto, Strack, & Tamuz (2018). The recent literature is related to Spencer (1985), who developed a decision-theoretic framework for modeling optimal data quality.

On the empirical side, a handful of interesting use cases suggest techniques for uncovering the value of data. For example, Card, Mas, Moretti, & Saez (2012) and Perez-Truglia (2016) show how workers respond to pay transparency policies, which give them new information about co-worker salaries. Spencer & Seeskin (2015) use a calibration exercise to study the costs, measured in misallocated congressional seats, of reduced accuracy in population census data.

References

Bergemann, D., Bonatti, A., & Smolin, A. (2018). The design and price of information. American Economic Review, 108(1), 1–48. https://doi.org/10.1257/aer.20161079

Card, D., Mas, A., Moretti, E., & Saez, E. (2012). Inequality at work: The effect of peer salaries on job satisfaction. American Economic Review, 102(6), 2981–3003. https://doi.org/10.1257/aer.102.6.2981

Couper, M. P., Singer, E., Conrad, F. G., & Groves, R. M. (2008). Risk of disclosure, perceptions of risk, and concerns about privacy and confidentiality as factors in survey participation. Journal of Official Statistics, 24(2), 255. Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3096944/

Cummings, R., Ligett, K., Nissim, K., Roth, A., & Wu, Z. S. (2016). Adaptive learning with robust generalization guarantees. CoRR, abs/1602.07726. Retrieved from http://arxiv.org/abs/1602.07726

Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., & Roth, A. (2015). Generalization in adaptive data analysis and holdout reuse. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in neural information processing systems 28 (pp. 2341–2349). Retrieved from http://papers.nips.cc/paper/5993-generalization-in-adaptive-data-analysis-and-holdout-reuse.pdf

Ghosh, A., & Roth, A. (2015). Selling privacy at auction. Games and Economic Behavior, 91, 334–346. https://doi.org/10.1016/j.geb.2013.06.013

Li, C., Li, D. Y., Miklau, G., & Suciu, D. A. N. (2014). A theory of pricing private data. ACM Transactions on Database Systems, 39(4), 34:1–34:27. https://doi.org/10.1145/2448496.2448502

Perez-Truglia, R. (2016). The effects of income transparency on well-being: Evidence from a natural experiment. SSRN. https://doi.org/10.2139/ssrn.2657808

Pomatto, L., Strack, P., & Tamuz, O. (2018). The cost of information. arXiv.

Spencer, B. D. (1985). Optimal data quality. Journal of the American Statistical Association, 80(391), 564–573. https://doi.org/10.1080/01621459.1985.10478155

Spencer, B. D., & Seeskin, Z. H. (2015). Effects of Census accuracy on apportionment of Congress and allocations of federal funds. JSM Proceedings, Government Statistics Section, 3061–3075. Retrieved from https://www.ipr.northwestern.edu/publications/papers/2015/ipr-wp-15-05.html

Taylor, C. R. (2004). Consumer privacy and the market for customer information. The RAND Journal of Economics, 35(4), 631–650. https://doi.org/10.2307/1593765

Varian, H. R. (1998). Markets for Information Goods (pp. 1–19) [Mimeo]. Retrieved from UC Berkeley School of Information website: http://people.ischool.berkeley.edu/~hal/Papers/japan/index.html