Data analysis and visualization of ethnic diversity & gender distribution in the MoMa art collection

Nadia Piet
Towards Data Science
8 min readJan 15, 2021

--

Spoiler alert: 11 works in the 131.000 item collection are by non-binary artists, with 9 of those being in the new media art department

Being asked to explore any topic we were interested in for my Data Science class as part of the MA Data-Driven Design, I stumbled upon this official MoMa dataset outlining 138.185 artworks from their collection. Surprised to find nobody had published any analysis or visualizations of it before, my decision was quickly made. The resulting insights were interesting and perhaps useful for further research so I decided to repurpose my assignment and share it with you here!

In 🕵🏼‍♀️ research framework, I state the goal of the research and sketch the context of diversity in modern art institutions and the MoMa in particular.

In 📊 data insights, I show the results from the analysis illustrated with colorful graphs and accompanied by a conclusion and discussions.

In 🧽 methodology, I detail the process and decisions made in data cleaning and pre-processing using largely Python, Pandas, and Plotly.

Disclaimer

I do not claim absolute truth with these findings — merely inferences reflected in the dataset. The dimensions of diversity are extremely limited and do not account for intersectionality. These results are not accusatory but a starting point for discussion. The data pre-processing may contain flaws — I’m learning. Now that we got this out of the way, let’s get into it.

🕵🏼‍♀️ Research framework

Research goals

The research sets out to analyze ethnic and gender diversity in the MoMa collection, with focus on a comparative analysis between contemporary art and new media artworks. My hypothesis was that the new media art department would have acquired works by a wider diversity of artists than its contemporary counterpart.

While the scope of the research is extremely limited in both the range of its dataset (limited to the MoMa institution) and measures of diversity (limited to only ethnicity and gender), the analysis hopes to provide a starting point for further discussion around representation within art institutions and different art practices.

Diversity in art and the MoMa collection

The Museum of Modern Art (MoMA) is one of the biggest and most influential museums in the world, exercising a large influence over the development and collection of modern art (Kleiner, 2016, pp. 1–3). Kurt Lewin’s theory on gatekeeping as a process of selecting, and then filtering, items of media that can be consumed within the time or space that an individual happens to havehelps us to understand the MoMa institution as an important gatekeeper in setting the global art agenda.

In 2019, the MoMa reopened after a big renovation meant to symbolize and enable its commitment to a more diverse collection with artists from more diverse geographies and backgrounds (Farago, 2019). The first large-scale study of artist diversity in major U.S. museums, conducted in March 2019, provides estimates of gender and ethnic diversity at each museum, concluding that 85% of artists are white and 87% are men. It also observes that the majority of efforts to increase diversity are focused on visitors and staff rather than the artists represented in the collections (Topaz, 2019).

The global art industry is grappling with diversity issues. As the MoMa is publicly vocal about its commitment to diversity, we set out to evaluate whether these intentions reach beyond its programs into the collections themselves.

Contemporary vs new media art

In the distinction between contemporary and new media art, we adhere to the definition of new media art as “a comprehensive term that encompasses art forms that are either produced, modified, and transmitted by means of new media/digital technologies” (Grau, 2016). New media art on the contrary is perceived to have a more accessible nature and inclusive culture, yet little to no public research is available to (dis)prove this claim. The study also seeks to (dis)prove whether the perceived improved diversity across new media art as compared to contemporary art is reflected in the MoMa collection.

📊 Data insights

Ethnic diversity in the MoMa collection

The data shows a clear increase in ethnic diversity in the collection throughout the years, with its highest peak at purchasing artist works from 61 unique nationalities in 2018 — triple the amount from 18 in 1989. The animation on the right shows the unique nationalities purchased from for each popular classification, (in this animation) travelling from 1986 all the way to 2019.

Images by Author

The world map shows an apparent overrepresentation of European and American artists, while the Middle-East, and parts of Africa and Asia are underrepresented in the collection. The data here is normalized on a logarithmic scale.

Image by Author

Gender distribution in the MoMa collection

Gender distribution is extremely imbalanced with 85% representing male artists, 14% female, and 1% undefined or unknown. A mere 0.01% of the works in the MoMa collection are made by artists who identify as non-binary: accounting for a total of 11 works out of the 140.000 in the dataset.

From the graphic on the right we can infer some classifications have better gender distribution than others by how far the dots are apart. We can see a huge gap between male (25K) and female representation (4K) in photography works, with architecture being more balanced (upon further investigation indeed single-handedly caused by Zaha Hadid).

Image by Author

Diversity in contemporary vs new media art collections

The data shows clear differences between the sets of contemporary art and new media artworks. For every 100 artworks acquired, contemporary art will purchase from artists hailing from on average 3–4 unique nationalities with new media art including up to 22, is 6 times as much.

The gender distribution across contemporary art and new media art is made up of 85% to 72% male, 14.5 to 23.5% % by female, and 0.001% to 0.3% respectively.

Image by Author

While the contemporary art collection acquired after 2000 counts 1 work from a non-binary artist, the new media art collection holds 9.

Image by Author

Conclusion

The differences between contemporary and new media art are significant and confirm the perception of new media art being a more diverse space reflected in the MoMa collection. Across the collection we see non-binary and POC artists still grossly underrepresented, concluding we have a long way to go in our commitment to present art collections as a reflection of society.

Discussion

For continued research, it would be interesting to dive deeper into comparisons and correlations between the various classifications and consider more intersectional measures of diversity. As for limitations, this study is limited to ethnicity and gender as measures of diversity. The ethnicity is based on the person’s nationality, not their race or what culture they self-identify with. Thus, this is a very limited representation of what diversity means.

As for complications, states that no longer exist such as Yugoslavia had to be removed from the dataset because they were not compatible with the Plotly library. While for the purpose of this study it isn’t directly problematic, it is easy to see how such ease of erasure of a nation’s cultural heritage calls for caution.

🧽 Data (pre-)processing

The MoMa dataset(s)

The dataset used is the Artworks file made available on the official MoMa GitHub. It contains 138.185 records with detailed metadata for each.

Data cleaning & formatting

After an initial data cleaning, fusing duplicates and removing missing and corrupted values, 130.822 works remain.

The different art practices have been based on the Classification column. The contemporary art set includes Drawing, Furniture and Interiors, Illustrated Book, Print, Sculpture, Painting, Publication, Textile, and Work on Paper, and has a total size of 23754 works.

The new media art set includes Audio, Digital, Film, Graphic Design, Media, Video, Software with a total of 3089 works. Some practices such as design or photography were not included in either datasets, as these categories offered no meaningful way to separate them. Within the data set provided, there are much more works of contemporary art than new media art which is why in some of the visualizations the values are normalized to percentages.

The timelines are based on the year that the work was acquired by the museum, not the year it was produced by the artist. Various visualizations on the dashboard cover different timespans. In the comparisons between contemporary art and new media art both datasets include only works after 2000.

Nationalities is based on nationality metadata with only primary nationalities considered. Country conversion is done using the coco converter library and a Demonym CSV file available on GitHub. The nationality counts on the world map have been normalized using logarithmic re-scaling. Some nationalities have been removed from the dataset because the ISO ALPHA 3 codes required to plot on a world map no longer support them. This resulted in nationalities such as Yugoslav and Coptic to be removed, Welsh and Scottish to be added under the UK class, and Native American under America —a simple line of code with complex implications briefly touched upon in the discussion.

Gender is based on gender metadata (male/female/non-binary/undefined) with only the primary gender considered.

Languages & libraries

The data formatting and cleaning is done using Python and Pandas primarily. The data visualizations rely on Plotly, Plotly Express, and Dash to display the graphs on an HTML page.

Image by Author

Thank you

If you got this far, thank you for reading along with me! I’d love to hear your thoughts on the insights, if and how you might use them, and if you have any recommendations for me to improve my approach for future (data) projects.

--

--

Designer & researcher focussed on AI/ML, data, digital culture & the human condition mediated through computing