The world’s leading publication for data science, AI, and ML professionals.

How to make your data project ethical by design

A simple framework to start with data ethics today

Image by Riccardo Annandale on Unsplash
Image by Riccardo Annandale on Unsplash

Data is the lifeblood of companies today. Not only does day-to-day functioning rely on a constant feed of data about every aspect of operations, it’s becoming increasingly clear that with enough data and the right analysis, previously intractable problems can be solved and processes improved. It should come as no surprise that data science is currently ranked #2 on Glassdoor’s 2021 list of best jobs in the US (and has been #1 for 4 of the past 6 years).

But as the 2018 Facebook/Cambridge Analytical political scandal made clear to the world, modern methods for gathering and analyzing large amounts of data can also raise ethical issues. After that scandal broke, the whole world started forming an opinion about how data may and may not be used, kicking off what might be called the age of data ethics.

Couple this with existing and emerging legislation that aims to limit how much customer data can be collected and for what purpose, and the bottom line is: if your company uses customer data to make customer-facing decisions, the ethical and legal issues involved in making those decisions must be considered.

Creating experts in ethical thinking

In my last article, I described the kinds of issues that can arise as the result of negligent data usage. I also touched upon the numerous frameworks that data scientists can use as primers to begin thinking about the potential issues that may be posed by their work. But as data increasingly becomes a driver of decision making across an organization, awareness of data ethics needs to expand beyond data science teams. Avoiding unwitting traps means considering the ethics of every data use case as an integral part of an organization’s processes.

The company I work for actively helps prevent companies falling into ethical data traps. We try to infuse the ethics of data usage into our DNA. Every consultant at our company is trained on data security and ethics to some degree. We also recently established an Inclusivity, Diversity, Equity and Awareness (IDEA) group for people whose passion is to think about how ethics and fairness impact Business decision making. When working on a data project, the ideas of the IDEA group (what a fun acronym) helps ensure that our data scientists are aware of the latest trends regarding the ethics of leveraging customer data.

Ethical data usage frameworks

Often companies think addressing ethical issues will be unfavorable for the business, or somehow "break" what they do. But in our experience, this is never the case. In my work, I help companies reach their goals with data in an ethical way.

To that end, I (together with two colleagues) have created an Ethical Risk Quick Scan for helping clients quickly assess whether their data use case may cause an ethical risk. It is difficult to evaluate risk and define required measures for use cases for which solutions have not yet been designed, let alone built. Nevertheless, ethical risk is a crucial criterion in prioritizing use cases and choosing an approach. Therefore, we developed a Quick Scan to be used precisely during this early stage of use case selection and requirements gathering. It gives us a feel for the level and areas of risk involved early on in a project, and therefore in which domains extra attention is necessary. The framework helps you incorporate Ethics into the design of data projects.

The Quick Scan looks like this:

Framework created by author for IG&H
Framework created by author for IG&H

We have designed the Quick Scan to provide a visual representation of potential ethical trouble spots. Just fill in the dots as they correspond to the details of your case.

The framework in action: a taxi driver use case

Here is a sample of how the framework can be used: Imagine a taxi service that digitally monitors many aspects of the rides that are assigned and carried out by their taxi drivers. They want to develop a model that will score driver performance based on this data and automatically adjust driver paychecks accordingly.

Let’s discuss this use case scores on every dimension of the framework:

  • Vulnerable people impacted? The people impacted typically have below-average financial means and some may live paycheck-to-paycheck.
  • Number of people impacted large? This is an internal application that does not impact many people (only the company’s taxi driver employees).
  • ‘Matters of life’ affected? The model’s decisions will affect matters of life since they will impact financial wellbeing and job security.
  • Influencing personal behavior? The model’s decisions may influence the behavior of the taxi drivers in the sense that they might stimulate taxi drivers to work longer hours, or to accept rides to or from locations they are not comfortable with.
  • No, or slow, feedback loop? The effect of the model’s decisions will be visible quickly (every week or month), so there will be a fast feedback loop.
  • No human in the loop? The use case calls for autonomous performance scoring and paycheck adjustment, which means there is no human in the loop.
  • Bias in the data? Performance is typically a subjective concept, so the data the model will be trained on may be biased.
  • Personal Data used? Finally, the data will involve fine-grained personal location data, which is considered personal data.

Here’s how I would then fill the Quick Scan based on this particular use case:

Framework created by author for IG&H
Framework created by author for IG&H

For each of the areas that have red dots filled in, actions to mitigate potential issues must be considered or taken. Additionally, it is recommended to pay extra attention for areas that have the ‘Medium Risk’ dot filled in. The table below provides a sample of possible mitigations for the biggest trouble spots in this case.

The Quick Scan focuses attention on measures and mitigations that can be taken before a project is even underway. Once a project is in full swing, and as the actual solution and data requirements become increasingly clear, the team can take its evaluation to the next level with the Ethical Risk Evaluation Framework. This further examines at-risk dimensions in terms of the commonly accepted ethical guidelines of Safety, Fairness, Transparency, and Privacy. We’ve distilled these guidelines from a meta study done on 36 prominent AI principles documents.

Ensuring ethical conduct in a data transformation

During a transformation, the data scientists at my company work alongside sector experts to ensure the seamless interplay between data, analytics, technology and business skills. At the same time, we ensures that data usage is optimized to effectively realize business objectives without running afoul of ethical data usage guidelines and regulations.

In the next article I intend to shed light on the types of actions we can take based on the Quick Scan outcomes using our (more extensive) Ethical Risk Evaluation Framework. Using these guidelines, we can eliminate and/or reduce identified risks and make the use of AI not just feasible, but also viable and desirable.

_This article (and the framework) has been written in co-production with my colleagues Mando Rotman and Floor Komen. It has also been published to our company website, where you can also download the framework in pdf format so you can use it yourself._

This article has also been published to IG&H’s website. It is the second article in a series of three, of which you can find here: part I and part III


Related Articles