Get Your Spotify Streaming History With Python

With delicious song features on top.

Vlad Gheorghe
Towards Data Science

--

It ain’t the same without those features. Source

This is my first Medium story! Any feedback is appreciated.

If you’re a dedicated Spotifyer like myself, there are lots of things you could learn by looking at your streaming data. Do you tend to listen to sad songs in winter? What happened to your music preferences when you fell in love?

Luckily, Spotify allows you to request a download of all your streaming history. In this tutorial, I will show you how to extract this data, flavor it with delicious song features, and organize it into a handy CSV file that you can analyse with your favorite tools.

But there’s more. By the time we get there, you will also gain a basic understanding of how the Spotify API works, how to complete the Authorization Code Flow, and how you can build your own Spotify app. Read on!

But if you’re in a hurry, the full code is available at my GitHub.

Features, features

Spotify’s audio features are complex metrics that aim to describe a track’s personality and general impression on the listener. Here is a brief description of each:

acousticness — how acoustic
danceability — self-explanatory
energy — how 'fast, loud an noisy'
instrumentalness — the less vocals, the higher
liveness — whether there is audience in the recording
loudness — self-explanatory
speechiness — the more spoken words, the higher
valence — whether the track sounds happy or sad
tempo — the bpm

Spotify also measures the duration, key, mode and time signature of each track. You can read more about features in Spotify’s documentation.

What we need to do

First, we get our streaming data from Spotify. Since the features are not included, we request them from the Spotify API. Finally, we export the data to our favorite format.

Requirements

The main requirement for this task is the Spotipy library. We will also use the Requests module for addressing the API. Though not essential, I’ve also included Pandas because it makes it very easy to save and load tabular data.

Make sure you install the necessary dependencies:

The action

Getting the data

Access your Spotify account dashboard at https://www.spotify.com/. In the privacy settings, you’ll find the option to request your data. This requires some patience. Spotify says it takes up to thirty days, but it’s usually much faster. In my case I waited three days.

Eventually you will get an email with your Spotify data in a .zip file. Extract the MyData folder and copy it in your working folder.

Acquiring the streamings

There are several files in our folder. Those that interest us look like this: StreamingHistory0.json. You might have one file or more, depending on the size of your streaming history. Let’s open the file. My first song looks like this:

Not a bad choice, right?

Let’s write a Python function that will collect all StreamingHistory files, extract the JSON objects and convert them into Python dictionaries.

That’s it. Now we have a timestamped list of all the tracks in our history.

Becoming a Spotify Developer

Spotify does not include song features in its data download. We’ll have to request them via the Spotify API.

Access to the API is free, but we’ll need to register a Spotify app. No worries: it only takes a few minutes. Just sign up here.

Congratulations: you are officially a Spotify developer!

Go to your new developer dashboard and click on ‘Create an App’. Don’t worry about the details. Spotify will allow you to create dummy apps as long as you promise not to monetize them. But you should avoid using ‘Spotify’ in the name, or it might get blocked.

Authorization Code Flow

Nobody could explain it this simply to me, so here goes. An app can access the Spotify API, but only if it gets permission from at least one user. So we’ll use the app to ask ourselves permission to access our user data.

We need to provide a ‘redirect link’ that we’ll use to collect the user’s permission. From your app’s panel in the developer dashboard, click on ‘Edit Settings’ and add a link under Redirect URIs. This doesn’t have to be a real link: if you don’t have a website, you can simply use http://localhost:7777/callback.

You will also need your app’s Client ID and Client Secret. You’ll find them in the app panel under your app’s name. Now you have all you need to access the Spotify API!

Accessing the Spotify API

Spotipy to the rescue. Insert the variables you just collected in these fields:

This function packages a request that goes from your app (identified through the Client Id and the Client Secret) to the user (identified by the Spotify username).

The request has a scope, which defines which permissions you’re going to ask. You can learn more about scopes here.

Finally, you’ll need to provide a Redirect URI. This must correspond to the one you white-listed in your app’s settings (see previous section).

The function returns a token, which is basically a string that we’ll use to assure the Spotify API that we have user authorization.

Once you run the function with the right parameters, it will open an authorization panel in your web browser. Follow the link, log with your Spotify credentials and you should see something like this:

The Spotify authorization panel.

Now you can finally authorize your app. Once you click on Agree, you will be taken to the Redirect URI, which may well be a nonexistent page. Just copy the address and paste it in your Python console.

That’s it. If you print your token variable in the console, you should see something like this:

Access token and refresh token

But there’s more. If you shut down your code and run it again, you won’t have to provide authorization, despite the token variable having been lost to memory. Spotipy seems to remember your token. What is happening here?

If you go to your working folder and enable hidden files visualization, you will see that Spotipy has created a new file named .cache-your-username.

As you can see, your access token has an expiration period, typically of one hour. But Spotify also provides you with a refresh token. When the original token expires, your app can use the refresh token to request a new one.

Hence, it’s important that you call the prompt_for_user_token function to load the token every time you run your script. If it finds a cache file, it will use it. If you move or delete the cache file, your tokens will be lost and the user will have to authorize you again.

Getting IDs

The Spotify data download does not provide us with tracks’ IDs. We need those IDs to get the features.

We can obtain the IDs by using the API to search the name of our track, taking the first result and extracting the ID. Spotify shows how to build such requests in the documentation.

I wanted to use the Requests library to perform this in Python. Since I’m not yet familiar with it, I used the script at https://curl.trillworks.com/ to convert the curl command into Python Requests code. Here is Spotify’s example of a curl command:

curl -X GET "https://api.spotify.com/v1/search?q=tania%20bowra&type=artist" -H "Authorization: Bearer {your access token}"

And here is the function I wrote with the help of curlconverter:

Let’s test our function:

Nice. Paste that ID in Open Spotify and get a load of classic.

Getting the features

Now that we have our IDs, it’s a breeze to get the features from the API.

Let’s test it on our track:

Wrapping it up

Wonderful! Now we have all we need to build our streaming history dataframe. The easiest way is to create a list of dictionaries:

(If you have thousands of songs in your history, getting the data from the API could take considerable time. I suggest you first test your code on a sample.)

Voilà. We have a nice stack of streamings, and with features to boot. Now we can use Pandas to turn our list of dicts into a dataframe and export it to CSV.

We’re done!

If you head over to my GitHub, you’ll see my code is a bit different. When you repeat API requests, you might get responses that were denied in the first run. Also, in the future you might want to rerun the code with a new batch of streamings. But API requests are slow, so I’ve added functions that save the IDs and features that you already collected, allowing you to make new requests without repeating old ones.

What’s next?

Once you have your streaming history with all the features, you can unsheathe your Pandas/Matplotlib skills to analyse and plot them. You could even apply machine learning to answer some interesting questions.

If you have a friend and partner who’s into Spotify, you can use this script to gift them their history. Just enter their username in your code, send them the authorization link, and make sure they send you the Redirect URI.

In the follow-up, I’ll explore how we can analyse the data we just collected. Meanwhile, you can check other cool articles on the subject.

That’s the end of my first Medium story. Thanks for reading!

--

--