Abstract
As mobile app usage continues to rise, so does the generation of extensive user interaction data, which includes actions such as swiping, zooming, or the time spent on a screen. Apps often collect a large amount of this data and claim to anonymize it, yet concerns arise regarding the adequacy of these measures. In many cases, the so-called anonymized data still has the potential to profile and, in some instances, re-identify individual users. This situation is compounded by a lack of transparency, leading to potential breaches of user trust.
Our work investigates the gap between privacy policies and actual app behavior, focusing on the collection and handling of user interaction data. We analyzed the top 100 apps across diverse categories using static analysis methods to evaluate the alignment between policy claims and implemented data collection techniques. Our findings highlight the lack of transparency in data collection and the associated risk of re-identification, raising concerns about user privacy and trust. This study emphasizes the importance of clear communication and enhanced transparency in privacy practices for mobile app development.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
The German Google Play Store was selected for its adherence to the GDPR, ensuring that the apps included in the study would have well-constructed privacy policies. https://play.google.com/store/apps?hl=en_US &gl=DE.
References
Avdiienko, V., et al.: Mining apps for abnormal usage of sensitive data. In: The 37th IEEE International Conference on Software Engineering, vol. 1, pp. 426–436. IEEE (2015)
Creţu, A.M., Monti, F., Marrone, S., Dong, X., Bronstein, M., de Montjoye, Y.A.: Interaction data are identifiable even across long periods of time. Nat. Commun. 13(1), 313 (2022)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2019)
Enck, W., et al.: TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans. Comput. Syst. (TOCS) 32(2), 1–29 (2014)
Grünewald, E., Pallas, F.: TILT: a GDPR-aligned transparency information language and toolkit for practical privacy engineering. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 636–646 (2021)
Leiva, L.A., Arapakis, I., Iordanou, C.: My mouse, my rules: privacy issues of behavioral user profiling via mouse tracking. In: Proceedings of the 2021 Conference on Human Information Interaction and Retrieval, pp. 51–61 (2021)
Marda, V.: Non-personal data: the case of the Indian Data Protection Bill, definitions and assumptions (2020). https://www.adalovelaceinstitute.org/blog/non-personal-data-indian-data-protection-bill/. Accessed 28 Nov 2023
Qu, Z., Rastogi, V., Zhang, X., Chen, Y., Zhu, T., Chen, Z.: AutoCog: measuring the description-to-permission fidelity in android applications. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1354–1365 (2014)
Ravichander, A., Black, A.W., Norton, T., Wilson, S., Sadeh, N.: Breaking down walls of text: how can NLP benefit consumer privacy? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1 (2021)
Rzhevkina, A.: Several EU countries banned Google analytics - here are some alternatives (2022). https://www.contentgrip.com/eu-countries-ban-google-analytics/. Accessed 03 Nov 2023
Singh, A., Raghavan, M., Chugh, B., Prasad, S.: The contours of public policy for non-personal data flows in India (2019). https://www.dvara.com/research/blog/2019/09/24/the-contours-of-public-policy-for-non-personal-data-flows-in-india/. Accessed 28 Nov 2023
Tang, F., Østvold, B.M.: Transparency in app analytics: analyzing the collection of user interaction data. In: 2023 20th Annual International Conference on Privacy, Security and Trust (PST), pp. 1–10 (2023). https://doi.org/10.1109/PST58708.2023.10320181
Tesfay, W.B., Hofmann, P., Nakamura, T., Kiyomoto, S., Serna, J.: PrivacyGuide: towards an implementation of the EU GDPR on internet privacy policy evaluation. In: Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics, IWSPA 2018, pp. 15–21 (2018)
Zhang, X., Wang, X., Slavin, R., Breaux, T., Niu, J.: How does misconfiguration of analytic services compromise mobile privacy? In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pp. 1572–1583 (2020)
Zimmeck, S., Goldstein, R., Baraka, D.: PrivacyFlash pro: automating privacy policy generation for mobile apps. In: NDSS (2021)
Zimmeck, S., et al.: MAPS: scaling privacy compliance analysis to a million apps. Proc. Priv. Enhanc. Tech. 2019, 66 (2019)
Acknowledgement
This paper is an extended version of work published in [12]. This work is part of the Privacy Matters (PriMa) project. The PriMa project has received funding from European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 860315.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Implementation of Claim Analysis Using BERT
We opted for BERT over GPT-3 for its bidirectional architecture, enabling a thorough contextual understanding of privacy policies, essential for our analysis. BERT’s capacity to analyze both left and right sentence contexts is particularly effective for interpreting complex privacy policy language [3]. In our implementation, BERT was tailored to privacy policy language, involving pre-processing steps like tokenization and normalization, and trained on a specialized dataset to identify binary claims and data collection methods. We also employed bigram analysis to recognize common word pairs, augmenting the model’s proficiency in interpreting policy language and thereby enhancing its precision and recall.
The model achieved a precision of 95% and a recall of 98% for claim extraction. For data types and collection methods, we observed precision and recall rates of 82% and 74%, respectively, and for collection techniques, precision and recall stood at 92% and 78%, showcasing the model’s robust performance.
B Code Analysis and Performance Metrics
We selected 20 popular apps from the German Google Play Store, meticulously identifying each instance of user interaction data collection to establish a ground truth. Our static analysis method was then evaluated against this dataset.
Our method demonstrated high accuracy (91%), precision (92%), and recall (79%), with an overall F1-score of 85.5%, indicating effectiveness in accurately identifying and classifying user interaction data collection in mobile apps.
C Types of User Interaction Data and Collection Techn.s
We identified six types of user interaction data based on our analysis of Android UI widgets: App Presentation Data, Binary Data, Categorical Data, User Input Data, Gesture Data, and Composite Gestures Data. For a detailed explanation of these types, refer to [12]. Similarly, our study categorizes collection techniques as Frequency, Duration, and Motion Details. Each technique’s specifics are also elaborated upon in [12].
Rights and permissions
Copyright information
© 2024 IFIP International Federation for Information Processing
About this paper
Cite this paper
Tang, F., Østvold, B.M. (2024). User Interaction Data in Apps: Comparing Policy Claims to Implementations. In: Bieker, F., de Conca, S., Gruschka, N., Jensen, M., Schiering, I. (eds) Privacy and Identity Management. Sharing in a Digital World. Privacy and Identity 2023. IFIP Advances in Information and Communication Technology, vol 695. Springer, Cham. https://doi.org/10.1007/978-3-031-57978-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-57978-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57977-6
Online ISBN: 978-3-031-57978-3
eBook Packages: Computer ScienceComputer Science (R0)