Methodology
This study analyzes privacy issues surrounding the Android mobile app ecosystem, with a focus on mobile apps directed to teenagers.9 In conducting this study, we aimed to survey the landscape of teen mobile apps for privacy risks and assess how teen data is harvested by the products they use.
TAPP’s dataset for this paper is based on a general dataset and a teen dataset.
General Dataset
To create this dataset, we scraped data from the top 200 apps for each genre10 in the Google Play Store, yielding an initial set of 11,338 apps with accompanying data.11 We then expanded this dataset to include all apps marked by Google Play as similar12 to the initial set of 11,338 apps, yielding a total of 53,686 apps, and scraped app data from the Google Play Store to include the entire expanded dataset13 This scraped data included a list of permissions requested, indicators as to monetization through advertising and in-app purchases, installation counts, links to privacy policies, and in some cases the location of the app publisher.
Teen Dataset
To create this dataset, we identified popular apps (apps which had been installed 20 million or more times) with characteristics likely directed at teenagers. This process yielded a list of 1,322 apps.
To further narrow this sample, we built a multi-factor framework for assessing whether an app was reasonably classified as directed at teens. To construct this framework, we adapted industry standards for marketing to teens, motion picture and software rating guidelines, FTC parameters for assessing the child-directed nature of content for compliance with COPPA, and general knowledge about popular teen products in 2020.14
Using this framework as a reference, we manually narrowed our sample to 1,156 teen-directed apps. After identifying these apps, we downloaded the Android application package (APK)15 of each.16 Accounting for minor adjustments and download failures, the final list for the teen dataset consisted of 1,144 apps with accompanying APKs.
For each of these datasets, where possible, we assessed multiple privacy dimensions of each app.
- The presence and number of permissions each app requested
- The number of trackers integrated into each app
- The readability of each app’s description in the app store
- The presence of each app’s privacy policy
- Whether each app monetizes through advertising or in-app purchases
“Teen Directedness”
In addition, we measured the teen dataset and compared it with the general dataset to confirm that our filtering processes produced a list of teen-focused apps. We looked at three points of comparison:
- Does the genre breakdown change between teen and general datasets? We expected, for example, to see fewer utility apps, business apps, etc. in the teen dataset.
- Using an appropriate readability scoring formula, is the reading level of the app descriptions on the Google Play Store lower for apps from the teen dataset than for the general dataset? We expect the descriptions of apps targeted at a general audience to score at a higher reading level than the descriptions of apps targeted at teens.
- Which words predominate in the descriptions? We expected teen-directed apps to use different words in their descriptions as compared with general apps, and we further expected the most common words to be easily associable with teens, their behavior, or their interests.