skip to main content
10.1145/3199478.3199486acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccspConference Proceedingsconference-collections
research-article

Challenges in Classifying Privacy Policies by Machine Learning with Word-based Features

Published: 16 March 2018 Publication History

Abstract

In this paper, we discuss challenges when we try to automatically classify privacy policies using machine learning with words as the features. Since it is difficult for general public to understand privacy policies, it is necessary to support them to do that. To this end, the authors believe that machine learning is one of the promising ways because users can grasp the meaning of policies through outputs by a machine learning algorithm. Our final goal is to develop a system which automatically translates privacy policies into privacy labels [1]. Toward this goal, we classify sentences in privacy policies with category labels, using popular machine learning algorithms, such as a naive Bayes classifier.We choose these algorithms because we could use trained classifiers to evaluate keywords appropriate for privacy labels. Therefore, we adopt words as the features of those algorithms. Experimental results show about 85% accuracy. We think that much higher accuracy is necessary to achieve our final goal. By changing learning settings, we identified one reason of low accuracies such that privacy policies include many sentences which are not direct description of information about categories. It seems that such sentences are redundant but maybe they are essential in case of legal documents in order to prevent misinterpreting. Thus, it is important for machine learning algorithms to handle these redundant sentences appropriately.

References

[1]
Diego Roberto Gonçalves de Pontes and Sergio Donizetti Zorzo. PPMark: An Architecture to Generate Privacy Labels Using TF-IDF Techniques and the Rabin Karp Algorithm. In Information Technology: New Generations, pages 1029--1040. Springer, 2016.
[2]
Aleecia M. McDonald and Lorrie Faith Cranor. The Cost of Reading Privacy Policies. Journal of Law and Policy for the Information Society, 2008.
[3]
Patrick Gage Kelley, Joanna Bresee, Lorrie Faith Cranor, and Robert W Reeder. A nutrition label for privacy. In Proceedings of the 5th Symposium on Usable Privacy and Security, page 4. ACM, 2009.
[4]
Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level Convolutional Networks for Text Classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems, pages 649--657, 2015.
[5]
Yoon Kim. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 1746--1751, 2014.
[6]
Elisa Costante, Yuanhao Sun, Milan Petković, and Jerry den Hartog. A machine learning solution to assess privacy policy completeness(short paper). In Proceedings of the 2012 ACM Workshop on Privacy in the Electronic Society, pages 91--96. ACM, 2012.
[7]
Niharika Guntamukkala, Rozita Dara, and Gary Grewal. A Machine-Learning Based Approach for Measuring the Completeness of Online Privacy Policies. In 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pages 289--294. IEEE, 2015.
[8]
Terms of Service; Didn't Read (ToS;DR). http://tosdr.org/ index.html.
[9]
Sebastian Zimmeck and Steven M Bellovin. Privee: An Architecture for Automatically Analyzing Web Privacy Policies. In 23rd USENIX Security Symposium (USENIX Security 14), pages 1--16, 2014.
[10]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schuetze. Introduction to Information Retrieval. Cambridge University Press, 2008.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICCSP 2018: Proceedings of the 2nd International Conference on Cryptography, Security and Privacy
March 2018
187 pages
ISBN:9781450363617
DOI:10.1145/3199478
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Wuhan Univ.: Wuhan University, China
  • University of Electronic Science and Technology of China: University of Electronic Science and Technology of China

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bag-of-Words Model
  2. Naive Bayes Classifiers
  3. Privacy Labels
  4. Random Forests
  5. Support Vector Machines
  6. TF-IDF

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICCSP 2018

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Systematic Review of Privacy Policy LiteratureACM Computing Surveys10.1145/369839357:2(1-43)Online publication date: 1-Oct-2024
  • (2021)Unifying Privacy Policy DetectionProceedings on Privacy Enhancing Technologies10.2478/popets-2021-00812021:4(480-499)Online publication date: 23-Jul-2021
  • (2021)TILTProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency10.1145/3442188.3445925(636-646)Online publication date: 3-Mar-2021
  • (2021)A machine learning-based approach to identify unlawful practices in online terms of service: analysis, implementation and evaluationNeural Computing and Applications10.1007/s00521-021-06343-6Online publication date: 28-Jul-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media