{"id":423,"date":"2016-02-23T10:44:28","date_gmt":"2016-02-23T10:44:28","guid":{"rendered":"http:\/\/blogs.sussex.ac.uk\/everydaychildhoods\/?p=423"},"modified":"2016-04-24T14:23:58","modified_gmt":"2016-04-24T14:23:58","slug":"the-coming-ethical-crisis-data-scraping-young-peoples-lives","status":"publish","type":"post","link":"https:\/\/blogs.sussex.ac.uk\/everydaychildhoods\/2016\/02\/23\/the-coming-ethical-crisis-data-scraping-young-peoples-lives\/","title":{"rendered":"The coming \u2018ethical\u2019 crisis? Data scraping young people\u2019s lives"},"content":{"rendered":"<p><em>Dr Liam Berriman, Lecturer in Digital Humanities\/Social Science, University of Sussex<\/em><\/p>\n<p>In <a href=\"http:\/\/soc.sagepub.com\/content\/41\/5\/885.abstract\">2007<\/a>, Savage and Burrows predicted a \u2018coming crises of empirical sociology\u2019 as mainstream sociological methods were muscled out by new commercial data analytics techniques. Reflecting on their paper nearly a decade later, they admit that the scale of disruption caused by \u2018big data\u2019 (as it is now known) was unimaginable, even at that moment in time (<a href=\"http:\/\/bds.sagepub.com\/content\/1\/1\/2053951714540280\">Burrows &amp; Savage 2014<\/a>).<\/p>\n<p>Our conceptualisation of \u2018data\u2019, and the language we use to describe it, have been irreversibly changed by the arrival of big data. For a new breed of data analysts, any dataset that is less than \u2018total\u2019 or \u2018complete\u2019 has become \u2018small data\u2019. The very language of data has been transformed by a new lexicon of analytics, real-time, tracking and scraping etc. However, remaining relatively unchanged is our language for talking about the ethics of \u2018big data\u2019.<\/p>\n<p>This short piece focuses on one particular aspect of big data\u2019s methodology \u2013 \u2018data scraping\u2019 \u2013 and the ethical questions it raises for researching young people\u2019s lives through digital data.<\/p>\n<p>According to Marres and Weltevrede (<a href=\"http:\/\/www.tandfonline.com\/doi\/abs\/10.1080\/17530350.2013.772070\">2013<\/a>), scraping is an \u2018automated\u2019 method of capturing online data. It involves a piece of software being programmed (e.g. given instructions) to extract data from a particular source and creating a \u2018big\u2019 dataset that would be too onerous to capture manually.<\/p>\n<p>Over the last few years, \u2018scraping\u2019 has been much lauded as a means by which data capture can be \u2018scaled up\u2019 to new analytical heights, particularly in relation to one of the most popular sources for big data capture \u2013 social media. Whilst \u2018scraping\u2019 techniques have advanced, a much slower trend has been the discussion of what ethical frameworks and language we need for robustly interrogating these techniques.<\/p>\n<p>As one of the largest constituent users of social media, young people are a particularly relevant group within these debates. Data scraped from social media inevitably captures the conversations, thoughts and expressions of young people\u2019s lives, even if as an \u2018inadvertent\u2019 by-product of research.<\/p>\n<p>In <a href=\"http:\/\/link.springer.com\/article\/10.1007\/s10676-010-9227-5\">2010<\/a>, Michael Zimmer reported on a study that had captured the profile data of a whole cohort of American college students on Facebook. The data had been taken without permission and a failure to appropriately anonymise the data had seen the identities of the students revealed. Zimmer\u2019s article provided a robust critique of a growing data capture trend where all data not hidden by privacy settings was seen as consensually \u2018public\u2019, and available for analysis.<\/p>\n<div style=\"float: RIGHT;margin: 0 25px 5px 0\">\n<p><iframe loading=\"lazy\" width=\"350\" height=\"197\" src=\"https:\/\/www.youtube.com\/embed\/AXaUqaXS3yI?feature=oembed\" frameborder=\"0\" allowfullscreen><\/iframe><\/p>\n<\/div>\n<p>The ethical lessons learnt from incidents such as these have tended to focus more on greater care for data anonymization and security, and less on issues of consent and intrusion. Again, Zimmer (<a href=\"http:\/\/www.michaelzimmer.org\/2010\/02\/12\/is-it-ethical-to-harvest-public-twitter-accounts-without-consent\/\">2010<\/a>) has been particularly vocal in refuting claims that techniques such as anonymization through aggregation are \u2018enough\u2019<a href=\"#_ftn1\" name=\"_ftnref1\">[1]<\/a>.<\/p>\n<p>How do these debates connect with young people\u2019s social media data? Television programmes such as <em><a href=\"http:\/\/www.channel4.com\/programmes\/teens\">Teens<\/a> <\/em>and <em><a href=\"http:\/\/www.channel4.com\/programmes\/the-secret-life-of-students\">The Secret Life of Students<\/a><\/em><a href=\"#_ftn2\" name=\"_ftnref2\">[2]<\/a> have played a significant role in perpetuating the idea that young people are less concerned than adults about having their data made public. However, studies have repeatedly shown that young people are highly concerned about privacy online (<a href=\"http:\/\/www.danah.org\/books\/ItsComplicated.pdf\">boyd, 2014<\/a>; <a href=\"http:\/\/dx.doi.org\/10.1080\/13676261.2014.992323\">Berriman &amp; Thomson, 2015<\/a>), and the disclosure of their digital data (<a href=\"http:\/\/dx.doi.org\/10.1016\/j.chb.2013.09.012\">Bryce &amp; Fraser, 2014<\/a>).<\/p>\n<p>A little while ago, I became aware that \u2018scraping\u2019 has a colloquial meaning in some UK secondary schools. According to <a href=\"http:\/\/www.urbandictionary.com\/define.php?term=scrape&amp;defid=2721741\">Urban Dictionary<\/a> (think Wikipedia for slang terms and phrases), the term \u2018scrape\u2019 is used to describe:<\/p>\n<p style=\"padding-left: 30px\"><em>a person intruding on something. To say that one has come out of nowhere and intruded on a conversation.\u00a0<\/em><\/p>\n<p style=\"padding-left: 30px\">[E.g.] <em>&#8216;two people have a conversation&#8217;, &#8216;another person listens in&#8217;<br \/>\none person out the original two people says &#8220;scrape out&#8221; to the other person.<\/em><\/p>\n<p>This colloquial definition makes reference to \u2018scraping\u2019 as an unwelcome form of eavesdropping and intrusion on a private conversation. In the context of these ethical discussions, this definition seems particularly apt. It emphasises that privacy <em>is<\/em> a concern for young people, and that unsolicited \u2018scraping\u2019 of private conversations is ethically and morally contentious.<\/p>\n<p>At present, there is a lack of serious ethical debate about the scraping of young people\u2019s digital data. The presumption of public-as-consent doesn\u2019t cut it. We need a new ethical language for talking about these issues, and young people\u2019s voices need to be represented in these debates.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"#_ftnref1\" name=\"_ftn1\">[1]<\/a> Indeed, how \u2018successful\u2019 these techniques are remains debateable \u2013 see <a href=\"http:\/\/arstechnica.com\/tech-policy\/2009\/09\/your-secrets-live-online-in-databases-of-ruin\/\">Ars Technica, 2009<\/a><\/p>\n<p><a href=\"#_ftnref2\" name=\"_ftn2\">[2]<\/a> Two documentary series by production company <em>Raw <\/em>for Channel 4 that followed young people\u2019s social media lives by harvesting their Tweets, texts, and Facebook updates.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Dr Liam Berriman, Lecturer in Digital Humanities\/Social Science, University of Sussex In 2007, Savage and Burrows predicted a \u2018coming crises of empirical sociology\u2019 as mainstream sociological methods were muscled out by new commercial data analytics techniques. Reflecting on their paper nearly a decade later, they admit that the scale of disruption caused by \u2018big data\u2019 (as it is now known) was unimaginable, even at that moment in time (Burrows &amp; Savage 2014). Our conceptualisation of \u2018data\u2019, and the language we use to describe it, have been irreversibly changed by the arrival of big data. For a new breed of data analysts, any dataset that is less than \u2018total\u2019 or \u2018complete\u2019 has become \u2018small data\u2019. The very language of data has been transformed by a new lexicon of analytics, real-time, tracking and scraping etc. However, remaining relatively unchanged is our language for talking about the ethics of \u2018big data\u2019. This short&#8230; <a class=\"read-more btn btn-default\" href=\"https:\/\/blogs.sussex.ac.uk\/everydaychildhoods\/2016\/02\/23\/the-coming-ethical-crisis-data-scraping-young-peoples-lives\/\">Read More<\/a><\/p>\n","protected":false},"author":70,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[103443,103444,103442,71790,53086,71770,103441],"_links":{"self":[{"href":"https:\/\/blogs.sussex.ac.uk\/everydaychildhoods\/wp-json\/wp\/v2\/posts\/423"}],"collection":[{"href":"https:\/\/blogs.sussex.ac.uk\/everydaychildhoods\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.sussex.ac.uk\/everydaychildhoods\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.sussex.ac.uk\/everydaychildhoods\/wp-json\/wp\/v2\/users\/70"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.sussex.ac.uk\/everydaychildhoods\/wp-json\/wp\/v2\/comments?post=423"}],"version-history":[{"count":10,"href":"https:\/\/blogs.sussex.ac.uk\/everydaychildhoods\/wp-json\/wp\/v2\/posts\/423\/revisions"}],"predecessor-version":[{"id":445,"href":"https:\/\/blogs.sussex.ac.uk\/everydaychildhoods\/wp-json\/wp\/v2\/posts\/423\/revisions\/445"}],"wp:attachment":[{"href":"https:\/\/blogs.sussex.ac.uk\/everydaychildhoods\/wp-json\/wp\/v2\/media?parent=423"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.sussex.ac.uk\/everydaychildhoods\/wp-json\/wp\/v2\/categories?post=423"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.sussex.ac.uk\/everydaychildhoods\/wp-json\/wp\/v2\/tags?post=423"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}