Revisiting the impact of common libraries for android-related investigations

L Li, T Riom, TF Bissyandé, H Wang, J Klein - Journal of Systems and …, 2019 - Elsevier
Journal of Systems and Software, 2019Elsevier
The packaging model of Android apps requires the entire code to be shipped into a single
APK file in order to be installed and executed on a device. This model introduces noises to
Android app analyses, eg, detection of repackaged applications, malware classification, as
not only the core developer code but also the other assistant code will be visited. Such
assistant code is often contributed by common libraries that are used pervasively by all
apps. Despite much effort has been put in our community to investigate Android libraries, the …
Abstract
The packaging model of Android apps requires the entire code to be shipped into a single APK file in order to be installed and executed on a device. This model introduces noises to Android app analyses, e.g., detection of repackaged applications, malware classification, as not only the core developer code but also the other assistant code will be visited. Such assistant code is often contributed by common libraries that are used pervasively by all apps.
Despite much effort has been put in our community to investigate Android libraries, the momentum of Android research has not yet produced a complete and reliable set of common libraries for supporting thorough analyses of Android apps. In this work, we hence leverage a dataset of about 1.5 million apps from Google Play to identify potential common libraries, including advertisement libraries, and their abstract representations. With several steps of refinements, we finally collect 1113 libraries supporting common functions and 240 libraries for advertisement. For each library, we also collected its various abstract representations that could be leveraged to find new usages, including obfuscated cases.
Based on these datasets, we further empirically revisit three popular Android app analyses, namely (1) repackaged app detection, (2) machine learning-based malware detection, and (3) static code analysis, aiming at measuring the impact of common libraries on their analysing performance. Our experimental results demonstrate that common library can indeed impact the performance of Android app analysis approaches. Indeed, common libraries can introduce both false positive and false negative results to repackaged app detection approaches. The existence of common libraries in Android apps may also impact the performance of machine learning-based classifications as well as that of static code analysers. All in all, the aforementioned results suggest that it is essential to harvest a reliable list of common libraries and also important to pay special attention to them when conducting Android-related investigations.
Elsevier