The Metropolitan Museum of Art has over 400,000 artworks from around the world, near half of which are open access of unrestricted commercial and noncommercial use. In this blog post, I analyze and visualize the artists and artworks metadata of its collection.

As of writing this blog post, there are 448,203 artworks by 56,390 artists.
The metadata MetObjects.csv
is available on Kaggle (with the latest version on their GitHub) although the dataset doesn’t include images. The code in this blog post is on my GitHub repo.
There are 448,203 rows and 43 columns in the metadata. There are a few columns such as Artist role, prefix, name, bio, nationality, begin and end date etc. And the rest of the columns are related to the artwork itself, for example, title, culture, period, dynasty, medium, dimensions, geography info etc.
Missing Values
There are a lot of missing values in many of the columns. 18 out of 42 columns have missing values > 75% and 23 out of 42 columns have missing values > 50%. We don’t need to worry about these missing values since we are not using the metadata for ML models training at the moment.
Artists
First let’s analyze the artists data.
Artists nationality
Over half of the artists (56.3%) are Americans, followed by French (19.8%) and Italian (13.4%).

Artists date of birth
Here is an histogram of artist date of birth indicates that many artists are born around 1950’s.

Artists with most artworks
Among the top 10 artists with the most collections, the artist named Walker Evans has the largest artworks at 9,659.

Artworks
Now let’s analyze the artworks metadata.
Artworks in the public domain
Near half of the artworks are in the public domain which means you can use them freely.

Artworks media
"Commercial color lithograph" is the largest number in the artworks medium.

Artworks by department
There are total of 20 departments and the department of Drawings and Prints have the largest collection of artworks of 154,445 pieces.

Artworks by classification
The Prints classification has the largest number of artworks, twice as many as the 2nd largest classification of Prints|Ephemera.
Note: if we were to use the images for classification then we need to figure out the overlap between Prints, Prints|Ephemera and Photographs|Ephemera.

Artworks by country and culture
It’s interesting to see that the artworks are mostly from Egypt although the top culture is American, which is consistent with the artist nationality above – 56.3% of artists are Americans.

Conclusion
This has been a fun data exploration and visualization of the artists and artworks. Since this dataset doesn’t include the images, if you wish to perform computer vision tasks with images, please use datasets such as the iMet Collection 2019 or iMet Collection 2020 competitions, or the BigQuery dataset. You can search and download images from its website and make sure to choose the Open Access Artworks.
I write blog posts on how to use AI for art and design including topics on data analysis, model training and application development. Follow me on Medium, check out my other stories and stay tuned for new posts.