The world’s leading publication for data science, AI, and ML professionals.

The Metropolitan Museum of Art Data Analysis and Visualization

Analyze and Visualize Artists and Artworks Metadata

The Metropolitan Museum of Art has over 400,000 artworks from around the world, near half of which are open access of unrestricted commercial and noncommercial use. In this blog post, I analyze and visualize the artists and artworks metadata of its collection.

Image by the author and the MET's Open Access Artworks
Image by the author and the MET’s Open Access Artworks

As of writing this blog post, there are 448,203 artworks by 56,390 artists.

The metadata MetObjects.csv is available on Kaggle (with the latest version on their GitHub) although the dataset doesn’t include images. The code in this blog post is on my GitHub repo.

There are 448,203 rows and 43 columns in the metadata. There are a few columns such as Artist role, prefix, name, bio, nationality, begin and end date etc. And the rest of the columns are related to the artwork itself, for example, title, culture, period, dynasty, medium, dimensions, geography info etc.

Missing Values

There are a lot of missing values in many of the columns. 18 out of 42 columns have missing values > 75% and 23 out of 42 columns have missing values > 50%. We don’t need to worry about these missing values since we are not using the metadata for ML models training at the moment.

Artists

First let’s analyze the artists data.

Artists nationality

Over half of the artists (56.3%) are Americans, followed by French (19.8%) and Italian (13.4%).

Image by the author
Image by the author

Artists date of birth

Here is an histogram of artist date of birth indicates that many artists are born around 1950’s.

Image by the author
Image by the author

Artists with most artworks

Among the top 10 artists with the most collections, the artist named Walker Evans has the largest artworks at 9,659.

Image by the author
Image by the author

Artworks

Now let’s analyze the artworks metadata.

Artworks in the public domain

Near half of the artworks are in the public domain which means you can use them freely.

Image by the author
Image by the author

Artworks media

"Commercial color lithograph" is the largest number in the artworks medium.

Image by the author
Image by the author

Artworks by department

There are total of 20 departments and the department of Drawings and Prints have the largest collection of artworks of 154,445 pieces.

Image by the author
Image by the author

Artworks by classification

The Prints classification has the largest number of artworks, twice as many as the 2nd largest classification of Prints|Ephemera.

Note: if we were to use the images for classification then we need to figure out the overlap between Prints, Prints|Ephemera and Photographs|Ephemera.

Image by the author
Image by the author

Artworks by country and culture

It’s interesting to see that the artworks are mostly from Egypt although the top culture is American, which is consistent with the artist nationality above – 56.3% of artists are Americans.

Image by the author
Image by the author

Conclusion

This has been a fun data exploration and visualization of the artists and artworks. Since this dataset doesn’t include the images, if you wish to perform computer vision tasks with images, please use datasets such as the iMet Collection 2019 or iMet Collection 2020 competitions, or the BigQuery dataset. You can search and download images from its website and make sure to choose the Open Access Artworks.


I write blog posts on how to use AI for art and design including topics on data analysis, model training and application development. Follow me on Medium, check out my other stories and stay tuned for new posts.


Related Articles