skip to main content
10.1145/2213836.2213876acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

GUPT: privacy preserving data analysis made easy

Published: 20 May 2012 Publication History

Abstract

It is often highly valuable for organizations to have their data analyzed by external agents. However, any program that computes on potentially sensitive data risks leaking information through its output. Differential privacy provides a theoretical framework for processing data while protecting the privacy of individual records in a dataset. Unfortunately, it has seen limited adoption because of the loss in output accuracy, the difficulty in making programs differentially private, lack of mechanisms to describe the privacy budget in a programmer's utilitarian terms, and the challenging requirement that data owners and data analysts manually distribute the limited privacy budget between queries.
This paper presents the design and evaluation of a new system, GUPT, that overcomes these challenges. Unlike existing differentially private systems such as PINQ and Airavat, it guarantees differential privacy to programs not developed with privacy in mind, makes no trust assumptions about the analysis program, and is secure to all known classes of side-channel attacks.
GUPT uses a new model of data sensitivity that degrades privacy of data over time. This enables efficient allocation of different levels of privacy for different user applications while guaranteeing an overall constant level of privacy and maximizing the utility of each application. GUPT also introduces techniques that improve the accuracy of output while achieving the same level of privacy. These approaches enable GUPT to easily execute a wide variety of data analysis programs while providing both utility and privacy.

References

[1]
N. Anciaux, L. Bouganim, H. H. van, P. Pucheral, and P. M. Apers. Data degradation: Making private data less sensitive over time. In CIKM, 2008.
[2]
F. Bancilhon and R. Ramakrishnan. An amateur's introduction to recursive query processing strategies. In SIGMOD, 1986.
[3]
M. Barbaro and T. Zeller. A face is exposed for aol searcher no. 4417749. The New York Times, Aug. 2006.
[4]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Operating Systems Design and Implementation, October 2004.
[5]
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006.
[6]
C. Dwork, M. Naor, T. Pitassi, and G. N. Rothblum. Differential privacy under continual observation. In STOC, 2010.
[7]
A. Frank and A. Asuncion. UCI machine learning repository, 2010.
[8]
S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In KDD, 2008.
[9]
A. Ghosh and A. Roth. Selling privacy at auction. http://arxiv.org/abs/1011.1375.
[10]
A. Haeberlen, B. C. Pierce, and A. Narayan. Differential privacy under fire. In USENIX Security, 2011.
[11]
M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow., 3:1021--1032, September 2010.
[12]
V. Karwa, S. Raskhodnikova, A. Smith, and G. Yaroslavtsev. Private analysis of graph structure. In VLDB, 2011.
[13]
D. Kifer. Attacks on privacy and definetti's theorem. In SIGMOD, 2009.
[14]
C. Li, M. Hay, V. Rastogi, G. Miklau, and A. McGregor. Optimizing linear counting queries under differential privacy. In PODS, 2010.
[15]
A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In ICDE, 2006.
[16]
F. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD, 2009.
[17]
F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007.
[18]
A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In IEEE Symposium on Security and Privacy, 2008.
[19]
K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In STOC, 2007.
[20]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999--66, Stanford InfoLab, 1999.
[21]
V. Rastogi and S. Nath. Differentially private aggregation of distributed time-series with transformation and encryption. In SIGMOD, 2010.
[22]
I. Roy, S. T. V. Setty, A. Kilzer, V. Shmatikov, and E. Witchel. Airavat: security and privacy for mapreduce. In NSDI, 2010.
[23]
A. Serjantov and G. Danezis. Towards an information theoretic metric for anonymity. In PET, 2002.
[24]
A. Smith. Privacy-preserving statistical estimation with optimal convergence rates. In STOC, 2011.
[25]
L. Sweeney. k-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002.
[26]
H. H. van, M. Fokkinga, and N. Anciaux. A framework to balance privacy and data usability using data degradation. In CSE, 2009.
[27]
C. Wright, C. Cowan, S. Smalley, J. Morris, and G. Kroah-Hartman. Linux security modules: General security support for the linux kernel. In USENIX Security, 2002.
[28]
X. Xiao, G. Bender, M. Hay, and J. Gehrke. ireduct: differential privacy with

Cited By

View all
  • (2024)Coordinate Descent for k-Means with Differential PrivacyWeb and Big Data10.1007/978-981-97-2387-4_10(145-158)Online publication date: 28-Apr-2024
  • (2023)k-means clustering with distance-based privacyProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666981(19570-19593)Online publication date: 10-Dec-2023
  • (2023)Solving the Performance Issues of Epsilon Estimation Method in Differentially Private ERM: Analysis, Solution and EvaluationProceedings of the 2023 2nd International Conference on Networks, Communications and Information Technology10.1145/3605801.3605820(93-99)Online publication date: 16-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
May 2012
886 pages
ISBN:9781450312479
DOI:10.1145/2213836
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 May 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. algorithms
  2. data mining
  3. differential privacy
  4. security

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '12
Sponsor:

Acceptance Rates

SIGMOD '12 Paper Acceptance Rate 48 of 289 submissions, 17%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)73
  • Downloads (Last 6 weeks)9
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Coordinate Descent for k-Means with Differential PrivacyWeb and Big Data10.1007/978-981-97-2387-4_10(145-158)Online publication date: 28-Apr-2024
  • (2023)k-means clustering with distance-based privacyProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666981(19570-19593)Online publication date: 10-Dec-2023
  • (2023)Solving the Performance Issues of Epsilon Estimation Method in Differentially Private ERM: Analysis, Solution and EvaluationProceedings of the 2023 2nd International Conference on Networks, Communications and Information Technology10.1145/3605801.3605820(93-99)Online publication date: 16-Jun-2023
  • (2023)Technical Perspective on 'R2T: Instance-optimal Truncation for Differentially Private Query Evaluation with Foreign KeysACM SIGMOD Record10.1145/3604437.360446152:1(114-114)Online publication date: 8-Jun-2023
  • (2023)SEAL: Capability-Based Access Control for Data-Analytic ScenariosProceedings of the 28th ACM Symposium on Access Control Models and Technologies10.1145/3589608.3593838(67-78)Online publication date: 24-May-2023
  • (2023)PrivLava: Synthesizing Relational Data with Foreign Keys under Differential PrivacyProceedings of the ACM on Management of Data10.1145/35892871:2(1-25)Online publication date: 20-Jun-2023
  • (2023)Private Graph Data Release: A SurveyACM Computing Surveys10.1145/356908555:11(1-39)Online publication date: 22-Feb-2023
  • (2023)Privacy-Preserved Framework for Short-Term Probabilistic Net Energy ForecastingIEEE Transactions on Industrial Informatics10.1109/TII.2022.321132819:6(7613-7623)Online publication date: Jun-2023
  • (2023)DPlanner: A Privacy Budgeting System for UtilityIEEE Transactions on Information Forensics and Security10.1109/TIFS.2022.323178618(1196-1210)Online publication date: 2023
  • (2023)SoK: Data Sovereignty2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP57164.2023.00017(122-143)Online publication date: Jul-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media