2•1 • Understanding data ethics •

"Datafication – transforming all things under the sun into data format and thus quantifying them – is at the heart of the networked world. "

By Javiera Atenas

Collecting, publishing and using data requires two key elements: data ethics and data protection.

  • Data ethics can be understood as the key principles governing what is right and wrong in the data cycle, from collection and production to its use.
  • Data Protection refers to the national and international regulations regarding personal privacy and rights about access and processing of the data.

In this lesson we explore the different facets of data ethics and data protection concerning commercial, educational and public data, starting with the premise that not all data, public or private, is publishable, and that not all uses are harmless. Thus we will centre the discussion on the different debates around data.

We need to consider that data is framed by regulations, which serve people, governments, organisations and industries to control and balance the potential uses of data so that they can benefit society without harming people. Thus, in the context of human-generated data, we will discuss two kinds of data to understand how data can and should be published and how to protect people, including vulnerable communities, from pervasive and intrusive uses of data.

2• 1.1 •Principles of data ethics•

We live in a ‘datafied’ society where almost everything is continuously transcribed into data, quantified and analysed (Van Es and Schäfer, 2017), where decisions taken by corporations and governments are increasingly data- and algorithm-driven. Data has an impact on what we watch and connect that ranges from economy to education and policy.  Data permeate almost every single element of modern life; therefore, it is crucial to understand the uses and risks of present and future uses of data. In addition, understanding the ethical conundrums we face when dealing with data will enable us to understand how data could be used in the future and how it can be interweaved to create new datasets that can be used to attempt to predict all kinds of behaviours and try to influence them (Hand, 2018).

The emergence of new technologies whose main aim is to process data to gain knowledge about human activities is generating social asymmetries between those who own the tools and have the expertise to collect and analyse data and the subjects whose data are subject to these applications (Belbis & Fumega, 2019). Artificial intelligence (AI) and other practices that are designed to exploit large volumes of data, which emerge as a product of the digitisation of the vast majority of information services, create the need to discuss ethical limits.

Data ethics can be understood as the responsible and sustainable use of data. Therefore it is key that we learn how to collect, select, analyse, and use such data under the key premise of ‘do no harm’, ensuring that the data-led research projects are beneficial for people and society. Data ethics need to be understood as a social contract between the public and data users (Buenadicha et al., 2019 -article in Spanish). It refers to a series of principles or guidelines to which any data research-led project or activity must adhere to. The main focus of the guidelines is human rights and personal data protection laws. These principles must lead to actively design fair and unbiased research and motivate students to learn, from the very beginning, the value of data protection and data agency by raising awareness of the role of an ethical common ground when conducting research with data, by treating others’ data as you wish your own is treated.

The World Bank has launched the newest report called Data for Better lives (2021), there is a proposal for a new social contract:

For data to realize its potential to transform lives, new rules of the road are needed – a social contract for data is needed. Such a contract would enable the use and reuse of data to create economic and social value, while ensuring equitable access to the value realized, as well as fostering participants’ trust that they will not be harmed by data misuse. Renewed efforts are required to improve data governance domestically, as well as through closer international cooperation. Moreover, the voice of low-income countries needs to be heard in the global debate on data governance.

The latest EU digital education action plan 2021-2027, proposes within their competence framework to support learners

to engage positively, critically and safely with this technology, and be aware of potential issues related to ethics, environmental sustainability, data protection and privacy, children rights, discrimination and bias, including gender bias and disability and ethnic and racial discrimination (p.14)

To teach how to embed data principles in research or project-based learning activities, it is key to be fully aware of the key elements of such principles, their values and conceptualisations. Furthermore, it can be very useful to enable learning through the 7 principles of data feminism (D’Ignazio & Klein 2020), these principles will serve as a guide to examine and challenge power dynamics considering, for a start, the diversity and social context of the students, organising research activities that:

  • Examine power: To help understand who is controlling the discourse, the issues and the panorama, as well as how decisions are being made, and where
  • Challenge Power: To support the development of personal and collective agency aiming at improving social problems
  • Elevate emotion and embodiment: To use information looking beyond data to give a voice to people, including their life experiences and emotions
  • Rethink binaries and hierarchies: To help understand how data puts people in clusters that can perpetuate oppression, thus, avoiding sustaining and validating such practices
  • Embraces pluralism: To promote the use of knowledge from diverse perspectives, giving priority to those normally unheard, and with space for Indigenous and experiential ways of knowing
  • Consider context: To acknowledge that data is not raw or neutral, but the product of unequal social relations, therefore, quite likely biased
  • Make labour visible: To ensure understanding of the work and work dynamics and politics behind data and data science projects and also, the work ethics of such projects.


  • Given the relevance of this topic and the didactical dimension of the book Data Feminism (Klein and D’Ignazio, 2019) mentioned above we encourage you to organise a reading activity during the semester using any annotation tool. Guidance and inspiration can be found in the site of the book.
  • If you need guidance on how to organise a reading group we point you to Hypothes.is, an open-source annotation tool with an active community of users (teachers and students) that have a wealth of materials to guide you in the task.

2• 1.2 •Examining the ethics of data•

If students are to navigate the turbulent waters of data, algorithms, and artificial intelligence, then data learning activities must foster reflection on how data are constructed and operationalised across societies. They should be provided with opportunities to learn from the analysis of data and from discussing the implications of data projects from a range of sources and perspectives. This is important so that they understand how people and data are portrayed, the historical impact of bias in data as well as how prejudices and also, cultural misconceptions have implications that affect the lives of people.

Some of the current uses of data that require careful consideration regarding ethics are as follow:

  • Technologies play a crucial role in collecting data from personal, professional and social activities, permeating the uses of any platform or device, including phones and credit cards, which is then shared with third parties with the intention to predict almost every behaviour and capitalise on their future use. These activities are called predictive analytics and are used to identify the likelihood of future outcomes based on historical data. Some activities that tend to be predicted are what will you be shopping  or what will you watch next on streaming platforms, but also, how likely are you to survive a heart attack in order to get life insurance cover.
  • The adaptive nature of data to play a key role in politics, as Hood and Margetts (2007) argue that governments operate through two sets of agents: detectors and effectors. Detectors gather information (data) from individuals and society, and effectors seek to influence them.  So in this case, we can see how data has been used during the COVID pandemic to develop public policy and also how data is used to forecast the economy. Also, we can see how data is used to influence voters targeting socioeconomic groups on political campaigns.
  • The interwovenness of data infrastructures that facilitate attempts to predict socioeconomic behaviours, which means, the collection of socio-economic data, which includes race, gender, neighbourhood, aiming to for example predict how likely certain students are to fail or succeed depending on their socioeconomic background, or how much do you have to pay for your car insurance depending on where you live, but worse, it used in police work, to predict for example who is getting profiled by the police,  foreseen as a criminal and most likely getting arrested.

Finally, it is also useful to discuss how the lack of regulatory and ethical frameworks to prevent misuses of data, are affecting us every day, by for example discriminating women on data-driven job recruitment, or having obvious racist uses or misuses of data and clear gender inequality access to health.

It is important that data-led research and learning activities are designed to address inequalities, improve the quality of life, explore issues that may be harming a community, and also, improve data governance. This is key to people acquiring the skills to participate in developing policy frameworks that go beyond data protection, by providing a fair, harmless, unbiased and equal data landscape that regulates the public and private sector usage of data.


To start exploring data ethics issues with your students, we recommend using the data ethics canvas designed by the Open Data Institute. Ask your students to form groups and plan data-led research. Ask them to select one social issue and discuss it using the canvas, and then, ask them to write a blog post about the ethical elements of their research project they discussed in the group.