Key principles for using human participant data from on-line sources/social media

Image by Gerd Altmann from Pixabay

Dr Ruth Stirton, Senior Lecturer in Healthcare Law and Chair of the Social Sciences and Arts Cross Schools Research Ethics Committee.

1.   Social media users: human participants

Social media research comes under the umbrella term ‘internet-mediated research’ (IMR) which is defined by the British Psychological Society as:

“…any research involving the remote acquisition of data from or about human participants using the Internet and its associated technologies.”

(Ethics guidelines for internet-mediated research, BPS, 2021)

Social media users are human participants in research terms and the overarching principles of ethical research with human participants apply. Social media research should – like any research – aim to:

  • maximise benefits for individuals and society and minimise risk and harm;
  • respect the rights and dignity of individuals and groups;
  • respect participants’ privacy, confidentiality and anonymity.  

Any research project involving social media users/their data will require University ethics review[1] and data collection must not start until the appropriate ethics approval and permissions are in place. 

2.   Personal data and data protection legislation

Social media research involves the processing of personal data and is subject to the Data Protection Act 2018 and the UK General Data Protection Regulation (UK GDPR).

Personal data is anything that enables a living person to be identified. It is information that relates to an identified or identifiable individual; that is, a person who:

  • can be identified or who are identifiable, directly from the information in question; or
  • who can be indirectly identified from that information in combination with other information.[2] 

Examples of personal data include: name, address, date of birth, research participant number, number, pseudonym, occupation, e-mail, CV, location data, Internet Protocol (IP) address, phone number, a user-name, ID post or Tweet.

The University must always have a lawful basis for processing[3] personal data and, in the context of research activities, our lawful basis is carrying out a task “in the public interest or in the exercise of official authority vested in the controller”, known as ‘public task’ (Article 6(1)(e) of the UK GDPR.) Under our Royal Charter, the purpose of the University is to “advance learning and knowledge by teaching and research to the benefit of the wider community,” meaning research activities are part of our ‘public task’.

Although there are some limited exemptions for research activities, researchers will need to comply with most of the requirements of data protection legislation. The key principles are:

  • Personal data should be processed in a fair and transparent manner;
  • Any personal data must be adequate, relevant and limited to what is necessary for the research;
  • Personal data should be accurate and, where necessary, kept up to date;
  • Personal data must be kept secure.[4]

3.   Special category data and criminal offence data 

Under data protection legislation the following types of personal data are called ‘special categories of personal data’.[5]

  • personal data revealing racial or ethnic origin;
  • personal data revealing political opinions;
  • personal data revealing religious or philosophical beliefs;
  • personal data revealing trade union membership;
  • genetic data;
  • biometric data (where used for identification purposes);
  • data concerning health;
  • data concerning a person’s sex life; and
  • data concerning a person’s sexual orientation.

Special category data should only be processed for a limited range of purposes and only where necessary. Under the UK GDPR researchers may process special category data for scientific and historical research purposes provided:

  • The processing is in the public interest;
  • It is not likely to cause substantial damage or distress to the individual; and
  • The processing must not be for the purpose of measures or decisions about a particular person, unless it is necessary for approved medical research.

Social media research projects collecting special category data to meet the research objective must meet the above conditions.  If the research will involve the processing of special category or criminal offence data on a large scale* then a Data Protection Impact Assessment (DPIA) will normally be required (see

*For the purpose of this guidance, a large data set is defined as that which contains more than 1,000 individuals/data subjects. Large scale processing or ‘mass data’ research is discussed in Section 3 below. 

The processing of criminal offence data (which includes personal information about criminal allegations, convictions, and proceedings) is also subject to specific conditions and limitations (see 

4.   Informed Consent

Informed consent is one of the core principles of research ethics.  Legally, consent is not required to process personal data for research purposes where the lawful basis for processing personal data is the University’s public task [see 1.2 above].  However, to ensure that research is ethical – and that respect for the autonomy, privacy and dignity of individuals and communities[6] is upheld – researchers should usually seek informed consent from participants by giving information about the research that allows them to make a meaningful choice about whether or not to take part.

In social media research, however, this is not always possible and “In many cases, a social media user’s data is accessed and analysed without informed consent having first been sought.” Social Media Research: A Guide to Ethics (Townsend & Wallace, 2016.)  This does not mean that the principle of informed consent does not apply to social media research; rather, that it is context-specific and will depend on the extent to which the social media user’s data can be said to be in the public domain, the terms of service of the platform the user posted on and the nature of the data itself. 

In some instances, such as accessing data via an on-line forum or group, it may be possible to seek informed consent from individuals or from a moderator/administrator.  In other cases, such as projects involving as automated mass data-scraping, it will not (realistically) be possible to do so and researchers will need to think about whether consent can be deemed to be ‘implied.’    

When applying this test, it is important to remember that:

  • Researchers cannot rely solely on a user’s agreement to a platform’s terms of service as equivalence to giving informed consent – there is strong evidence that users do not always read/understand terms of service[7].
  • The perception of the user as to whether their data is private or in the public domain must be considered – posting on a closed group/forum versus contributing to a hash-tagged public Twitter thread, for example.
  • There may be vulnerable individuals embedded in the data set from whom consent would normally be received via a guardian – e.g. children (and the data-set may include children who are below the platform’s minimum age requirement).  
  • The risks to the individual user through the use of their data needs to be considered – e.g. disclosure of criminal activity, or whether the data is “potentially sensitive/embarrassing or about fairly mundane daily activities or opinion.”[8]

It is simply not possible to provide a prescriptive list of scenarios where informed consent is required, and those where consent may be implied, as each case will be context-specific. A general rule of thumb is that the use of direct quotes would typically require consent but there may be exceptions to this, such as quotes taken from official or public-authority web-sites or from the content of a published newspaper article; or those made by public figures acting in their public capacity.    

Thinking about participants’ privacy, confidentiality and anonymity will help researchers work through some of the issues around consent.

5.   Privacy, Confidentiality and Anonymity

Researchers should always seek to protect participants’ privacy and confidentiality. Typically, this is achieved through the de-identification of individuals in research outputs so that quotes/data etc cannot be linked to the research participant.  In social media research this is difficult to achieve: a quick search of a quote via Google or another search engine may not only identify the individual but their location, and the time and context in which they originally posted and so anonymity cannot be assured.

A helpful way of thinking about the nature of social media users’ data and its representation in your research is to think of it as “private data on public display” (Nicolas Gold)[9].  In other words, a person’s data may be technically accessible or publicly available but it still contains private information about them that they may not expect to be ‘studied’ or included in another medium.

Wherever possible researchers should paraphrase quotes or present outputs in aggregate form to avoid the identification of individuals.  Where the use of direct quotes is necessary for the research output, it may be appropriate to contact the participant to seek their consent, or possible to demonstrate that the individual was aware that they were contributing to a public debate (public settings on Twitter, for example, and use of # hashtags to indicate contribution to a wider, open debate.)  There will be instances however, where the risks clearly outweigh the research benefits – such as exposing participants to the disclosure of illegal activity or compromising an individual/others – and direct quotes cannot therefore be used.  Researchers should consider and mitigate any risks associated with the research output and explain these clearly in their ethics review application. 

Even where the research output will not identify individuals, the data collected for the analysis constitutes personal data and must be stored securely on University systems in order to assure confidentiality.    

6.   Terms of Service

It is essential that researchers read and follow the terms of service of the social media platform they are collecting data from.  As explained above [1.4] terms of service are never a proxy for informed consent, but understanding the nature of the user’s contract with the platform will be helpful when negotiating issues around privacy and confidentiality and considering whether an individual’s data can be said to be ‘on public display.’  

Researchers must also be very clear as to the terms under which they may collect the data. For example, some social media platforms prohibit the automated collection of data (known as ‘data-scraping’) unless researchers do so via a specific developer agreement that grants them access to software developed/managed by the platform.  Twitter for example, mandates that researchers must collect data via the Twitter Application Programming Interface (API).  This places limits on the rate of data collection and also enables Twitter to notify researchers when a user has requested the deletion of a Tweet (which should it turn be removed from the research data set.)  Automated data collection is discussed further below in Section 3: Mass Data Research. 

It is not possible to explore the various platforms’ terms of service within the scope of this guidance, and it is the researchers’ responsibility to read and comply with any terms and conditions.  




[3] ‘Processing’ personal data means any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction.



[6] Ethics guidelines for internet-mediated research British Psychological Society, 2021  

[7] Social Media Research: A Guide to Ethics Townsend & Wallace, 2016

[8] Social Media Research: A Guide to Ethics Townsend & Wallace, 2016

[9] Using Twitter Data in Research Guidance for Researchers and Ethics Reviewers, Dr Nicolas Gold Department of Computer Science, UCL, 2020

Posted in Uncategorised

Leave a Reply

Your email address will not be published. Required fields are marked *