Facebook, Instagram and other platforms recently set their users a 10 year challenge: post your first ever Facebook photo and another one of you today. Whilst some users were quick to comply, others responded in unexpected ways. Some reactions were funny: Jennifer Aniston in 2009 is Iggy Pop in 2019. Some had a political message: an attractive underpass in 2009 is run down and occupied by homeless people in 2019; vibrant cities in Iraq, Libya, Yemen and Syria in 2009 are warzones in 2019. Some aimed to tell it like it is, such as this fictitious conversation at Facebook HQ:
@facebook: "Sir. Our facial recognition algorithms are becoming less accurate due to aging of users.”
Zuckerberg: "Tell @BuzzFeed to create a viral trend which gets people to post both their very first and current profile pics side by side. Adjust algorithms accordingly".
Or, as tech writer Kate O’Neill posted on Twitter:
Me 10 years ago: probably would have played along with the profile picture aging meme going around on Facebook and Instagram
Me now: ponders how all this data could be mined to train facial recognition algorithms on age progression and age recognition.
These suggestions of foul play, have in turn spawned retorts that point out that such data is already available to Facebook in all of the profile pictures and other photos we have already posted on the platform. To which O’Neill replies: this messy existing dataset has been superseded by the captioned new posts, which are far more helpful to Facebook’s aim of honing its facial recognition software. O’Neill concludes that while this mission of Facebook’s may not be cause for concern, it would be a good thing if we were all a bit more aware, and a bit more critical, of what happens to our data.
Critical, political or just plain mocking, responses to the challenge could be seen as evidence of the very thing that O’Neill says we need more of: public awareness of algorithms, data mining and analytics. High-profile coverage of stories, such as the Facebook/Cambridge Analytica scandal, may have shifted the ground in terms of what people know about how their data is used. So are responses to #tenyearchallenge a sign of growing public awareness of data and algorithms? The answer is simply, we don’t know: robust evidence about whether and how data analytics and related technologies like AI are understood by non-experts is seriously lacking.
Attitudes and understanding
A number of national and international surveys and polls have been carried out, but most of these focus on attitudes and perceptions, rather than knowledge and understanding. Some draw conclusions that do not appear to be backed up by their own data, others ask leading questions, and findings across surveys are inconsistent. For instance, the independent think tank doteveryone’s 2018 digital attitudes survey found that the majority of respondents knew that personal information is used to target advertising (70%), but fewer realised their data could also be sold to other companies (56%), or may determine the prices they are charged (21%). A minority inaccurately believed that invasive data collection takes place: 7% believed phone conversations are collected and 5% believe their eye movements when looking at the screen are collected. Doteveryone concludes that there is a link between trust and understanding, stating that ‘Without this understanding people are unable to make informed choices about how they use technologies. And without understanding it is likely that distrust of technologies may grow’, but it’s not clear that they have data to back up this claim.
An Ipsos MORI Global Trends survey found 83% of the UK respondents were unsure what information companies had on them. Trust in Personal Data: A UK Review by Digital Catapult (2015) notes that 96% of respondents to its survey claimed to understand the term ‘personal data’, but that when it came to describing it, less than two thirds (64%) chose the correct definition. Moreover, 65% of people surveyed ‘are unsure whether data is being shared without their consent’. The report concludes that their study highlights an increase in data knowledge, although there is no temporal comparison in their survey on which to base this claim.
Qualitative research paints a more nuanced picture. Taina Bucher’s study of Facebook users’ engagements with the platform’s algorithms identified both more knowledge than participants themselves acknowledged that they had and a range of playful interactions with the algorithms. Terje Colbjørnsen’s study of how people talk on social media about algorithmic moments on Spotify, Netflix and Amazon found similar knowledge and playfulness, with shrewd statements such as “go home algorithm your’e drunk”, users describing algorithms as like a ‘smug older brother’, or wishing they were a little ‘gayer’. Likewise, my own research in Post, Mine, Repeat and ‘The Feeling of Numbers’ with Rosemary Lucy Hill (2017), suggests that knowledge and experiences of data are more complex, diverse and nuanced than simple statistics suggest.
Why does it matter?
Knowing whether people understand the mining of their personal data is important for many reasons. First, as suggested above, there is a relationship between understanding and trust, the holy grail of data research. Elsewhere, I’ve argued that understanding what people feel about what companies do with their data is as important as understanding what they know, because hopes, fears, misconceptions, and aspirations play an important role in shaping attitudes to data mining. But we also need to know what people know. We need to move beyond a hunch that awareness is growing, as #tenyearchallenge responses suggest, to more concrete knowledge of what is known and what is not, where the gaps in knowledge and understanding are, and how to fill them. Only then can we start to envisage a participatory, data-driven society, of the kind that many of us would like to see.
The second reason this question matters relates to new initiatives which aim to influence uses of data, AI and their governance, like the government Centre for Data Ethics and Innovation (CDEI) and the independent Ada Lovelace Institute (Ada). Such initiatives claim that understanding public views and how data affect people will be at the heart of what they do, but this is only possible if such understanding actually exists. To ensure data works ‘for people and society’ (Ada’s mission) and is ‘a force for good’ (a CDEI aim), we need better evidence of what people really know about what happens to their data.