Dr Liam Berriman, Lecturer in Digital Humanities/Social Science, University of Sussex

In 2007, Savage and Burrows predicted a ‘coming crises of empirical sociology’ as mainstream sociological methods were muscled out by new commercial data analytics techniques. Reflecting on their paper nearly a decade later, they admit that the scale of disruption caused by ‘big data’ (as it is now known) was unimaginable, even at that moment in time (Burrows & Savage 2014).

Our conceptualisation of ‘data’, and the language we use to describe it, have been irreversibly changed by the arrival of big data. For a new breed of data analysts, any dataset that is less than ‘total’ or ‘complete’ has become ‘small data’. The very language of data has been transformed by a new lexicon of analytics, real-time, tracking and scraping etc. However, remaining relatively unchanged is our language for talking about the ethics of ‘big data’.

This short piece focuses on one particular aspect of big data’s methodology – ‘data scraping’ – and the ethical questions it raises for researching young people’s lives through digital data.

According to Marres and Weltevrede (2013), scraping is an ‘automated’ method of capturing online data. It involves a piece of software being programmed (e.g. given instructions) to extract data from a particular source and creating a ‘big’ dataset that would be too onerous to capture manually.

Over the last few years, ‘scraping’ has been much lauded as a means by which data capture can be ‘scaled up’ to new analytical heights, particularly in relation to one of the most popular sources for big data capture – social media. Whilst ‘scraping’ techniques have advanced, a much slower trend has been the discussion of what ethical frameworks and language we need for robustly interrogating these techniques.

As one of the largest constituent users of social media, young people are a particularly relevant group within these debates. Data scraped from social media inevitably captures the conversations, thoughts and expressions of young people’s lives, even if as an ‘inadvertent’ by-product of research.

In 2010, Michael Zimmer reported on a study that had captured the profile data of a whole cohort of American college students on Facebook. The data had been taken without permission and a failure to appropriately anonymise the data had seen the identities of the students revealed. Zimmer’s article provided a robust critique of a growing data capture trend where all data not hidden by privacy settings was seen as consensually ‘public’, and available for analysis.

The ethical lessons learnt from incidents such as these have tended to focus more on greater care for data anonymization and security, and less on issues of consent and intrusion. Again, Zimmer (2010) has been particularly vocal in refuting claims that techniques such as anonymization through aggregation are ‘enough’[1].

How do these debates connect with young people’s social media data? Television programmes such as Teens and The Secret Life of Students[2] have played a significant role in perpetuating the idea that young people are less concerned than adults about having their data made public. However, studies have repeatedly shown that young people are highly concerned about privacy online (boyd, 2014; Berriman & Thomson, 2015), and the disclosure of their digital data (Bryce & Fraser, 2014).

A little while ago, I became aware that ‘scraping’ has a colloquial meaning in some UK secondary schools. According to Urban Dictionary (think Wikipedia for slang terms and phrases), the term ‘scrape’ is used to describe:

a person intruding on something. To say that one has come out of nowhere and intruded on a conversation. 

[E.g.] ‘two people have a conversation’, ‘another person listens in’
one person out the original two people says “scrape out” to the other person.

This colloquial definition makes reference to ‘scraping’ as an unwelcome form of eavesdropping and intrusion on a private conversation. In the context of these ethical discussions, this definition seems particularly apt. It emphasises that privacy is a concern for young people, and that unsolicited ‘scraping’ of private conversations is ethically and morally contentious.

At present, there is a lack of serious ethical debate about the scraping of young people’s digital data. The presumption of public-as-consent doesn’t cut it. We need a new ethical language for talking about these issues, and young people’s voices need to be represented in these debates.



[1] Indeed, how ‘successful’ these techniques are remains debateable – see Ars Technica, 2009

[2] Two documentary series by production company Raw for Channel 4 that followed young people’s social media lives by harvesting their Tweets, texts, and Facebook updates.

February 23rd, 2016

Posted In: Uncategorized

Tags: , , , , , ,

Leave a Comment

Prof Rachel Thomson

You notice things at festivals, at the meeting point of genres and cultural forms: music, comedy, literature, performance. One of the things I noticed this summer was how participation seems to be infusing them all. In the theatre they talk about the 4th wall, the ‘make believe’ suspension of reality that enables us to unleash our imaginations without fear of ridicule or danger. The 4th wall exists in music too.  Salt-N-Pepa’s 1991 single “Let’s Talk About Sex” says “I don’t think this song’s going to be played on the radio”. John Cage’s 4 minutes and 33 seconds of silence demands we face the collaborative artifice of performance. A popular practice is to experiment along the boundaries of constructed-ness: fact, fiction, real time and the imaginative time travel. In the fabulous 12-year in-gestation ‘Boyhood’, Richard Linklater mixes up the meanings of family, friends and actors revealing the compelling and forward facing time frame within which we all exist and age, and within which culture gains audience and meaning. A space within which we build culture and community through call and response.

I am writing this on the train to London as I go to interview Lucien, whose life I have been documenting since before he was born. One of my aims today is to talk to him about what being involved in this study means, ensuring that his ‘consent’ is meaningful. His mum and I share cultural references. When I explained the nature of a longitudinal study I used the ‘7 up’ metaphor, the now-classic ongoing documentary series that seems to have become the key text of reflexive modernity: providing a vocabulary for understanding reality tv (the politics of editing), the emotional economies of public exposure (the more powerful withdrew) and the peculiar moral register of the real time voyeurism (we are implicated for good or bad). So when I asked Monica, 8-months pregnant, if she wanted to be in a longitudinal study she had this as a reference point. For Lucien things are different, especially in the early stages, but as a culture-savvy-in-no-hurry kid, he can place this experience alongside others – the homework projects that require a life story, YouTube stars who document their lives online, or just the long running tv shows like Dr Who and Top Gear where we see characters age and where real life scandal get folded into the cultural product.

But this is research…. I was tempted to write ‘research rather than entertainment’. I could also say ‘rather than documentary’ or more pertinently for Lucien ‘rather than play’, but I am not sure that research should be set up in this way. Research is defined by the presence of research questions and an interest in methodology – a plan for HOW to generate answers to these questions. But research is still part of life. It has its own fourth wall. What we are doing is ‘for research’ so, for example, we will call our interaction an ‘interview’, and this means that we do not have to follow the usual unwritten rules of conversation. Or we will call this an ‘observation’ and this means that I have license to spend the day with you, acting as a shadow, giving myself over to noticing the sounds, smells and sensations of your life.

Research ethics – as conceived through the adult subject – presume a tacit agreement to the terms of this game and the validity of the 4th wall. It is this that we ask our ‘subjects’ (the old language) or ‘participants’ (the new language) to consent to. I sign on this line to agree to take part in a conversation that is of a different order of things than usual life. This means that the researcher may have more power than is usual and I need to trust them not to hurt me or to use the knowledge that we together create as a result of our experiment in a way that might be disrespectful or damaging to me or, for that matter, others. With children who are used to slipping in and out of play mode, this ‘contract’ becomes rather ridiculous. In a healthy way it pushes us towards a different kind of gambit along the lines of ‘I am an adult who would really like to play with you. The reason for this is that I recognise that playing is a really good way of creating knowledge and understanding. Mostly you play with other kids, but also with adults, maybe parents and teachers. What is different with this is that I am going to make documents of our play together and these will be public. Which means that other people can see them. But also you can go back to them over time and they, unlike you, will not change. This might be odd. We’d like you to say yes to doing this. But given that neither of us really know what will happen in the future we are willing to negotiate over the documents. This means you can change your mind and that the records belong to both of us.’

Working with children, as actors know, has a special quality. Perhaps because the 4th wall is something that we have to be socialised into and which symbolically marks the end of an enchanted period of childhood in which the imagination is allowed full rein. The 4th wall is also a historically located phenomena – a product of a particular form of technology and associated literacies. Digital technologies change the game and new generations are taking new things for granted. Of course this poses challenges for research which relies on an analogue-ethics concerned with risk avoidance rather than participation. Working with ideas such as call and response and the power dynamics of the porous 4th wall may help us play.

December 4th, 2014

Posted In: Uncategorized

Tags: , , , ,

Leave a Comment