There’s been a lot of talk about ethics in the data industry, especially as it concerns the ethical use of algorithms. The former U.S. chief data scientist called for ethical code for data scientists. The American Statistical Association and Association for Computing Machinery. Google came up with ethical principles to govern its AI efforts. Lots of other people putting their oar in, from Bloomberg to the World Economic Forum’s Young Scientists group to Data for Democracy.
People make ethical codes at least partially because they’re dissatisfied with the state of ethical practice. But ethical codes don’t clearly result in an increase in ethical practice. In fact, as happens with an attempt to create a standard for anything, the only clear result of creating a code is the proliferation of additional codes:

In short, there’s a difference between ethical codes and actual ethical code. I want to talk a little about the former, but mostly about what we can do to get more of the latter.
Why I care about ethics
In industry, especially, I think it’s pretty reasonable to look at a topic like this and say things like "this kind of thing is just an academic exercise" or "this isn’t my problem because I personally behave ethically." So I want to take a moment to explain why I personally, as a data scientist working in industry, feel I need to devote my attention to this topic:
- Pakistan. I used to train statistical models to provide the U.S. Army with strategic intelligence. I modeled the cycle of military and militant operations in Pakistan, and my results contradicted a prominent administration policy statement. My boss’s boss called me into his office, and after berating me for half an hour, told to change the conclusions of the analysis. This was the first time I’d ever been told that I needed to fake my results (and the first time I had a person deny, I believe quite honestly, that that was what they were telling me to do).
- Sales reports. I was working for a student travel firm and my boss’s boss wanted a graph that showed ten years of revenue but he only had the data for four of those years; he wanted me to interpolate the missing data. He drew a picture of what he wanted the trend to show, and that drawing presented wildly unrealistic figures for the missing years and substantially changed the figures for the years we did have. This was the first time I’d been told to fake my data.
- S-curves. I was working for an asset management startup focused on investing in local consumer-goods producers. My boss really wanted to show that countries fit along an s-curve – the business consultant’s formulation of a logistic curve – with some at very low consumption, some in the stages of a rapid rise, and some already leveled off at high consumption levels. The data didn’t show what he wanted it to show. He pressed me to try more and more complex models in the hopes that they would show the desired pattern. This was the first time I’d been told to torture the data until it confessed.
- Teachers unions, the New York Times, and the NAACP. I was offered the job of founding the Data Science team for a charter school network so prominent and so politically active that, before even starting work, my prospective employer had been accused of using student test scores reject, intimidate, or remove undesirable teachers and students. I did my due diligence and decided those attacks were more political than factual, and my experience in that job confirmed my initial impressions, but I knew that whatever data systems I built could easily be used to punish students rather than support them, even if I didn’t build them for that purpose. This was the first time I’d had to constantly imagine, at every stage from design to development to deployment, how someone could use things I had built to hurt other people.
I think data science has value. I think it deserves to attract some of the best and brightest people entering the workforce. If we don’t constantly work to ensure widespread ethical practice and to call out and correct unethical practice, an increasing number of those best and brightest will leave to do other things. And rightfully so. If we don’t build an ethical industry, then we deserve to lose those people, and to see our business suffer as a result. It’s not enough for us to be individually ethical ourselves: only a few companies have to behave unethically for all companies to suffer for it. Good Ethics is often good business in the short term. It’s always good business in the long-term.
Ethical codes are neither necessary nor sufficient
When trying to get someone to develop either the capability or the intent to change their behavior, to avoid ethical pitfalls in the case of the present discussion, talking is a terribly ineffective mode of communication.

Telling people about a danger rarely helps them to avoid the danger. At least, it doesn’t help them to do so with any consistency. Ethical codes never show. They only tell. They are, in effect, goal statements. That’s why they don’t improve practice. All by themselves, they call attention to the fact of a problem’s existence, but they don’t prevent people from running into danger or help them get out of danger once they’re in it. Ethical codes aren’t sufficient to improve ethical practice.
That’s certainly not new information – I haven’t met anyone who argues that ethical codes are sufficient, although when participating in communities that write ethical codes I’ve often met people who act like they are. The thing is, if we can show the danger instead of telling about it, we can avoid or mitigate the danger without having to tell at all. Not only are ethical codes insufficient to improve ethical practice – they aren’t necessary either.
Fences and their dangers
It’s not just that ethical codes are neither necessary nor sufficient for ethical practice. They’re also potentially harmful. An ethical code is a boundary – a point beyond which people are not supposed to cross. If there’s no way to enforce adherence to the code, then it’s like a line in the sand – it’s there, but crossing it doesn’t carry any consequences so it might as well not be there.
If there is a way to enforce the code – in other words, if there’s a regulatory infrastructure in place – then it’s like a fence. Policy makers like fences because they’re scalable. All you have to do is define the rules in a way that you can audit adherence, then train and deploy auditors. The problem with fences is that they introduce systemic risk: people can build up all kinds of practices right along the border of the fence, even leaning up against it, without crossing over it. When the fence breaks, the consequences are more devastating than if the fence hadn’t been there in the first place. I’ve written about this before.
There are a lot of examples of the systemic risk introduced through large-scale fence-building. The rules governing mortgage lending prior to the 2008 financial crisis is one example, but maybe a more appropriate one for this discussion is the fence that differentiates p-values below 0.05 from those above it. That fence has been enforced across university classrooms, scientific publications, and, to a lesser extent, industry standards, and those regulations have risen along side a whole suite of practices to select model features, or sample data, or both, to make results fit inside the fence. In recent years, several fields, most prominently psychology and medicine, have suffered because too many people decided that results were trustworthy just because they fit within that threshold.
Fence-building is risky, and the larger the fence, the greater the risk. We shouldn’t resort to widespread regulation unless we have no other choice, and even then, we should proceed cautiously.
A tentative set of alternatives
Now let me spend the rest of the time talking about the alternative to ethical codes and ethical regulation. I don’t have any definite answers to offer here. I think there is sufficient reason to believe that we can build a more ethical industry through showing instead of telling, but my ideas are still preliminary. If, as a profession, we spent as much time and resources developing those ideas as we spent on developing ethical codes, we’d probably have something implementable pretty quick. I’m going to start with something any individual practitioner can do. Then I’ll talk about something that teams or companies can do. And then I’ll talk about something that has to happen across the whole industry.
Tooling for design. When we train a predictive model, there are methods to assess how accurate it is, including processes such as cross-validation that help reduce the risk of our methods providing overly-optimistic results. There are tools that enable all of these assessments, maintained across many different software packages and programming languages. New practitioners are trained on these tools and admonished when they fail to use them. We do all of this because we train models and build data systems to accomplish certain purposes, so we have an interesting estimating the extent to which our system accomplish those purposes.
We need a comparable toolset for estimating the extent to which our systems produce unintended consequences. It’s not enough to know that our tools do what we want them to do. We need to ensure that they aren’t doing what we don’t want them to do. Again, this ethical imperative is also just plain good business sense: unintended consequences incur costs that can’t be planned for.
There are tools out there to identify unintended consequences, but they aren’t nearly so numerous or user-friendly or well-maintained as the tools that measure intended consequences. For example, the Center for Data Science and Public Policy at University of Chicago developed a tool called Aequitas, where a user can specify algorithm’s intended uses – for example, whether the algorithm will be used to punish or assist people – and will analyze the algorithm’s results to see if some subgroups are disproportionately affect. There are a handful of tools that automatically search for instances of Simpson’s Paradox (here’s one example), where a trend exhibited in the overall data set reverses itself when confined to certain subgroups.
We need more of these tools. We need to build exploration of unintended consequences into our workflows. This is something every data practitioner can do.
Local (non-scalable) regulation. Big fences are risky. Little fences not as much. I’m not talking about self-regulation. I’m talking about local regulation. Local regulation can rely on showing ethical problems instead of telling, and local mistakes tend to also stay local.
There was a great article in the Washington Post about how local regulation worked to reduce harassment in a restaurant. The employees used a color-coded system classify uncomfortable customer behavior yellow (creepy vibe), orange (sexual undertones), or red (overt behavior or repeated orange incidents). All a staff member had to do was report the color – "I have an orange at table five" – and the manager took action, no questions asked. Red meant the customer was asked to leave. Orange meant the manager took over the table. Yellow meant the manager took over the table if the employee wanted.
It’s not hard to imagine how this could work in algorithm development. A yellow means an employee feels uncomfortable, and it prompts movement to another project if desired, as well as a team review of the issue. Orange means the employee can point to specific ethical concerns, and it prompts movement to another project as well as team review. Red means the employee has a strong concern and it puts the project on hold pending review. No second-guessing the employee’s concerns. Just action. It’s possible.
Reputation systems. There are already many examples of systems (Yelp, BBB, Glassdoor, etc.) that publicly expose information about private interactions with a company, and these systems change company behavior. These sites don’t, by any stretch of the imagination, give users a comprehensive, objective view in the performance/desirability/quality/goodness of a company. But companies are scared of bad reviews. When a bad review comes up, they spend time addressing it – demonstrating that in fact the review was false. Companies that don’t actually fix the problems get more negative reviews, and therefore draw fewer customers. It makes it so internal company practices impact the bottom line. I personally have withdrawn myself from consideration for certain jobs when a company was unable to give a satisfactory explanation for problems described on Glassdoor. In at least one case, the information that raised a red flag was about an ethical concern. Reputation systems are viable tools for shifting industry behavior.
The problem with most of these systems is that they try to accommodate all use cases, so a bad review can come because a customer was sexually harassed or because the customer failed to recognize that he himself placed the wrong order. A book review on Amazon, even if we discount deliberate mob attempts to inflate or deflate a book’s reputation, can reflect anything from the structure of the argument to the readability of the prose to the quality of the paper. For a reputation system to impact ethical practice, the scope would need to be much more narrow.
We have work to do
I understand the desire, when confronted with a clear ethical problem like those that frequently make the news, to want to do something. And I understand that trying to formulate clear and compelling ethical principles is a very doable something. It’s the wrong thing. It’s not going to solve our ethical problems. In my opinion, it won’t even help. In fact, it can very possibly hurt. Ethical codes can actually give unethical actors cover. We don’t want that.
I listed the above three alternatives for addressing unethical behavior not because I think they would solve all of our problems (of course they wouldn’t), but because they give everyone something to do that is both realistic and productive. Anyone who can code and do analyses can tool for design. Any manager or executive can implement and enforce policies about ethical review and non-retaliation against people who raise concerns. If you don’t work in a place that deals with these ethical issues, create a site that allows people who do to publicly post the issues they’ve had and the companies where they’ve had them. Individually, none of these things are going to do much good. Taken together, they, and ideas like them, could move us in the right direction.