The world’s leading publication for data science, AI, and ML professionals.

Diversity in Data Science: A Systemic Inequality

How FAANG companies are dealing with this structural problem

Photo by @kues1 on Freepik.
Photo by @kues1 on Freepik.

Why does no one look like me?

The lack of representation of minority groups in technical and leadership positions is an issue rooted in many industries. Historically, our society has been facing this problem for centuries now. Hence, the whole point of writing this piece is not for trying to find culprits, but to bring awareness and "move the needle" a little bit. So, as a Hispanic person, I want to contribute with my viewpoint on this matter. As a Data Scientist, I will use data to come up with some assumptions and general conclusions.

"Eu sou apenas um rapaz Latino-Americano" – Belchior, a Brazilian singer. Free Translation: "I’m just a Latino guy".

Like many problems that we solve using Data Science, this one is as complex as anything else. There is no easy fix to it. The solution we will find tomorrow starts working on it today! Am I too confident? You bet!

Main Topics I Will Cover Here:

1- Diversity in Data Science

2- FAANG Companies and Their Diversity Reports

3- Data Scientists by The Numbers

4- Issues Caused by Lack of Diversity in Data Science

5- What Higher Education Can Do

For this article, I will solely focus on Diversity issues on Gender and Ethnicity. Specifically, in the field of Data Science, and by extension in the Tech industry.

Diversity Stats on Data Science

I will layout the foundations of our discussion here based on a study made by the General Assembly (GA) three years ago. GA is an education company that is specialized in training students in Data Science and other technical fields. They put together their student’s data and what we will see here are some of the outputs of that. You can check their report here.

It is important to understand the characteristics of who is studying Data Science now. This might help us to predict the future because it can be an indicator of how the field is evolving – concerning diversity or not.

When comes down to ethnicity, we see an unbalancing. The numbers for Blacks and Latinos are very tiny. Looking at the gender distribution, we see that we have way more males than females studying Data Science, according to GA.

Figure 1 (created by the author). Race/Ethnicity and Gender Distribution of Data Science Classes. (Source: General Assembly)
Figure 1 (created by the author). Race/Ethnicity and Gender Distribution of Data Science Classes. (Source: General Assembly)

A further investigation shows that, at that point in time, the percentage of Blacks and Latinos taking their Data Science course was very low indeed. Only 12% of students enrolled in the program were Black and Latino.

Figure 2 (created by the author). Percent of Black and Hispanic/Latino taking Data Science Class at GA. (Source: General Assembly)
Figure 2 (created by the author). Percent of Black and Hispanic/Latino taking Data Science Class at GA. (Source: General Assembly)

To give some perspective on that, we can take a look a the breakdown of race/ethnicity from the Census Bureau as below:

Table 1. Population Estimates by Race, July 1, 2019. (Source: United States Census Bureau)
Table 1. Population Estimates by Race, July 1, 2019. (Source: United States Census Bureau)

We can say that the population of Black and Latino sums up to around 32% of the total population in America. However, around 12% of them were enrolled in the Data Science program.

The educational background of people studying Data Science was also measured as follows:

Figure 3 (created by the author). Educational Background of Students Taking Data Science Class at GA. (Source: General Assembly)
Figure 3 (created by the author). Educational Background of Students Taking Data Science Class at GA. (Source: General Assembly)

As we could see, Data Science students come from a highly educated background. "Graduate" means students with Master’s or Doctoral degrees. General Assembly concludes that "Data Science seems to draw from a smaller, more specialized pool, which could, in part, perpetuate diversity issues".

In fact, a study from Education Week shows that "Black, Latino Students Lack Access to High-Level Science and Math". Certainly, this reduces the access and participation of these minorities in STEM (Science, Technology, Engineering, and Math) programs and careers.

_Research_ from NCSES (National Center for Science and Engineering Statistics) shows that these minorities, including Native Americans, really have a low share in the S&E (Science and Engineering) occupations. See below:

Figure 4 (created by the author). S&E Occupations by Race/Ethnicity and Gender. (Source: NCSES)
Figure 4 (created by the author). S&E Occupations by Race/Ethnicity and Gender. (Source: NCSES)

We see a low contribution of the mentioned minorities in the S&E field. Moreover, we can further investigate women’s participation – in 2014 – according to some degree fields as following:

Figure 5 (created by the author). Field of Degrees for Women. (Source: NCSES)
Figure 5 (created by the author). Field of Degrees for Women. (Source: NCSES)

So, if you happen to be a woman and on top of that Black, Hispanic/Latino, or Native American, good luck to you. In fact, according to the National Science Board, only 28% of women with college degrees work on S&E. STEM careers and programs is where we find the highest gaps in race/ethnicity and gender. On top of that, these minorities tend to face a lot of biases while in college. This alone can make things worse and prevent minorities to thrive at the college level.

The lack of diversity in Data Science appears to be related to the lack of access to these minorities to STEM. While this might be a naive statement, due to its simplicity, it might be an assumption that we could take. So, nurturing access can be beneficial in the long run. I will explore more about this soon.

Who’s Fault?

Instead of focusing on finding a scapegoat for this complex problem, we should drive our energy on what we can do today. Structural inequality like this one is an issue that has been rooted in society for centuries. Working on solutions makes more sense than trying to find who to blame.

This issue affects the Tech industry. It hits it hard. Some businesses say that they are investing X amount of money on Diversity And Inclusion, but how does that translate into shaping the reality?

Moving next in our analysis, we will see some of the diversity reports released by FAANG (Facebook, Apple, Amazon, Netflix, and Google) companies. As you can imagine, the inequality issue we saw previously, extent throughout these companies. Meaning, the lack of access and participation of minorities in Tech is strong.


FAANG Companies and Their Diversity Reports

In this section, we will look at some of the diversity reports released by Faang companies and try to get a sense of what they are doing about this matter.

Ultimately, it is not their responsibility to fix this problem – in the big scheme of things – but working towards leveraging diversity can be extremely beneficial to them.

2020 Facebook Diversity Report

Facebook’s strategy to support diversity is focused on hiring minorities for what they call non-technical roles. This is seen, especially, for Black, Hispanic, and Women. Below you can see this trend for Black in the US.

Figure 6 (created by the author). Black's Representation by Role at Facebook. (Source: Facebook Diversity Report)
Figure 6 (created by the author). Black’s Representation by Role at Facebook. (Source: Facebook Diversity Report)

From my point of view, this is a good strategy for the short and mid-term. However, in the long run, Facebook should invest in training these minorities to occupy leadership and technical positions. Otherwise, it will just look that they are inflating their numbers on diversity, but not really. One thing that I liked about Facebook’s report is that they provided the breakdown by categories (e.g. technical). This allows for a better understanding of what the company is doing. You can check their report here.

2018 Apple Diversity Report

Apparently, that last data we have from Apple goes back to 2018. Also, according to Fortune, they just hired their new head for diversity and inclusion. Those things might be indicators that they might have been struggling on that.

Below we look at 2018 data, specifically for the Tech workforce.

Figure 7 (created by the author). Apple's Stats for Tech employees. (Source: Apple Inclusion and Diversity)
Figure 7 (created by the author). Apple’s Stats for Tech employees. (Source: Apple Inclusion and Diversity)

What we see is a pattern that goes throughout the industry. A huge imbalance in gender, and low participation of minorities in the Tech field.

2019 Amazon Diversity Report

Amazon’s report shows that the difference between men and women when comes down to management roles is large. Another discrepancy we see is the share of Black people in the company. While Black accounts for 26.5% of Amazon’s workforce, only 8.3% are working at managing positions.

Figure 8 (created by the author). Amazon's Diversity Report. (Source: Amazon Our Workforce Data)
Figure 8 (created by the author). Amazon’s Diversity Report. (Source: Amazon Our Workforce Data)

Although, Amazon says that they have made "year-over-year progress", it’s hard to see that when we don’t have past data available. Amazon claims that they are also prioritizing "pay equity" amongst their workforce. Find the report here.

2020 Netflix Diversity Report

It seems that Netflix is doing a great job of balancing their gender positions – at least worldwide. If you check out their report, you’ll see that women appear to have a great representation within the company. We only see a lack of balance when comes to Tech positions.

Figure 9 (created by the author). Diversity at Netflix. (Source: Netflix Workforce Demographics)
Figure 9 (created by the author). Diversity at Netflix. (Source: Netflix Workforce Demographics)

Regarding race/ethnicity, their numbers seem close to the trend we are seeing in the industry.

2020 Google Diversity Report

I think it was a great REPORT released by Google – a mean, as a document. It just seems that people took the time to work on a formal document – it gives more credibility.

Looking more critically at the report, it seems that the progress in fomenting diversity is very steady. Despite the fact that in this report they saw an increase in Black+ representation since they started to publish this data when you look at the trend, it does not seem that exciting. You can check this trend at Statista.

It appears to me that they are maintaining their status quo, and not tackling the issue aggressively. Below is a comparison of the last two years.

Figure 10 (created by the author). Leadership Diversity at Google. (Source: Google Diversity Report)
Figure 10 (created by the author). Leadership Diversity at Google. (Source: Google Diversity Report)

I know there were some tiny improvements, and it might be better to move the needle a lit bit than not moving at all. But, can Google do better than that?

To close this section, I just want to emphasize that whatever these companies are doing to minimize the issue should be praised. The goal here is not to find a culprit, but to show what’s going on right now. Yet, it does not seem that overall they are being really effective in promoting diversity and inclusion.


Data Scientists by the numbers

A report from Diffbot estimates the number of Data Scientists working for some of the main Tech companies. This report was published in January this year, so the numbers we will see below are estimates from 2019.

Figure 11 (created by the author). Estimated Number of Data Scientists by Company. (Source: Diffbot)
Figure 11 (created by the author). Estimated Number of Data Scientists by Company. (Source: Diffbot)

The biggest question one may ask is how many of these Data Scientists we see above belong to a minority group. Well, we will probably never know for sure, but Diffbot provided a breakdown of these numbers by gender. According to them, 23% are women and 77% are men.

This share looks familiar, huh? It’s basically what we saw before on FAANG companies. Surprised? You shouldn’t!

If you look again at "Image 1" from the General Assembly, you’ll see that the percentage of women taking their Data Science course was higher than what we see in the market. Why is that?

To get an estimation about race/ethnicity we can use Facebook as an example. Facebook has 723 Data Scientists (according to our source, and at that point in time). If we sum up the percents for Black, Hispanic, and what they call Aditional Groups, we get a total of 6.2% (this count is for 2020 alone) of participation of these minorities in Technical roles.

That is to say that we would have around 45 Data Scientists belonging to these minority groups if we are lucky. Again, this number might be off but based on the pattern that we are seeing across the board, it might make sense.

Kaggle’s "State of Data Science and Machine Learning 2020" Report

A couple of weeks ago, Kaggle released their most recent report on Data Science and Machine Learning demographics. Around 82% of the people who participated in this Kaggle’s survey were men. About 68% have a Master’s or Doctoral degree. Although the report does not show any data regarding race/ethnicity, it has a lot of other interesting findings. I’d strongly recommend you to check it out here.

As we could see, once again, the patterns that we are spotting throughout our analysis appears to be validated from different sources.

Issues Caused by Lack of Diversity in Data Science

The lack of diversity in Data Science has an impact on how algorithms are built. That also contributes to the average people distrust algorithms. This article from Vox talks more about this issue.

One of the most compelling pieces of evidence of this problem is the issue of racial bias. Now and then we hear some news on how, particularly, Black people are target on that. Women can also be forgotten and overlooked when designing algorithms.

We are all influenced by some cognitive biases. This is something we need to fight against daily. It’s a tough fight. But it can get worse. How? Well, when you have a room full of people that look the same, think the same way, and came from the same schools. It’s hard to create an environment that considers the different points of view. That could also limit the scope of creative solutions – innovation and inclusion should go hand in hand.


What Higher Education Can Do

When we face a structural inequality like that, it’s naive to think that companies should fix this problem by themselves. Yes, it seems to me that they have room for improvement. But we also need to graduate more people in STEM, especially minorities.

In that regard, colleges/universities also need to get more diverse and inclusive. For elite and private institutions, like for example the ones from Ivy League, this can be very challenging. But we should be doing way better. Checking at Princeton’s data, we can picture how tiny is the inclusion of Black among their campus.

Table 2 (created by the author). 2019–2020 Demographics of Students at [Princeton University](http://Princeton University). (Source: Princeton University)
Table 2 (created by the author). 2019–2020 Demographics of Students at [Princeton University](http://Princeton University). (Source: Princeton University)

Unfortunately, I did not find data specifically for STEM programs. But we can assume that access to minorities taking STEM tracks might be even lower than what we see above.

To makes things worse, a recent study shows that minority students drop STEM programs earlier and at higher rates than other races/ethnicities. So the challenge is not only getting more students but making sure to support them so they don’t quit.

Higher Education Administrators should look at this problem carefully, and devise strategies to mitigate these issues. It’s not an easy task, but it has to be done. According to Census Bureau, by 2050, the Hispanic population in the US will be around 30%. By then, we must increase the shares of minorities in the industries and higher education.


Back in Brazil

Photo by @moviafilmes on Freepik.
Photo by @moviafilmes on Freepik.

Even a "melting pot" country like Brazil struggles to provide equal access to disadvantaged groups.

"Back in Brazil There lives a girl Dreams of the future And a far, far better world" – Paul McCartney

Just recently, Magazine Luiza, which is one of the largest retail companies in Brazil, announced that their 2021 Trainee program will be focused on hiring Black people (only). This released divided the internet: on one side people were supporting this initiative, and on the other side, people were heavingly criticizing.

The company took this action because, although their workforce is composed of 53% of Black and Brown people, only 16% of them have leadership positions in the company. So they plan is to increase this share by focusing only on hiring all-black people for their trainee program. Learn more here.

What are your thoughts about this initiative?

On another note, the co-founder of the largest fintech in Latin America, Nubank, said that they find it difficult to hire Black people. Besides the technical aspect needed to work in the company, they also require fluency in English. The co-founder said that they could not "lower the bar" in order to hire more Black people. That caused her a lot of trouble and ultimately she had to publicly apologize. See here.

How do you see diversity and human resources as a competitive advantage?


Resting My Case

In closing, I want to stress that diversity and inclusion have deep roots in our society. We have a lot of work to do if we want to change things and break the wheel of inequality.

Whether on Silicon Valley or in the streets of São Paulo, diversity and inclusion initiatives are struggling to lift off.

How can you contribute to change this scenario today?


Related Articles