Spotiscience: A Tool for Data Scientists and Music Lovers

Spotiscience seeks to make it easy to download and predict data on Spotify’s music

Cristóbal V
Towards Data Science

--

Photo by Matheus Ferrero on Unsplash

Who doesn’t like to work with music? Many of us work all day with Spotify running in the background processes of our computers, while the random music from the playlists and artists that we like are generating a magical atmosphere with their melodies. they deliver that energy necessary to work with full productivity.

In order to integrate my work as a Data Scientist and my passion for music, I decided to create a tool called “Spotiscience”, which allows downloading data of songs, artists, albums, and playlists using the official Spotify API. In addition, this data is modeled to generate new data, such as knowing the mood of a song or the topics of the song lyrics, and finding similar songs, all this sounds very interesting, right? If you want to know more, keep reading this Article!

In this article you will learn to:

  • Download data of songs, albums, playlists and artists from the Spotify API
  • Download lyrics of songs from API Genius
  • Predict the music mood of a song
  • Find most relevant topics of a song lyric
  • Search similar songs in albums, artist discography, and playlists from Spotify

Index

1.1 SpotiscienceDownloader

  • 1.1.1 Initial Settings
  • 1.1.2 Extraction of Song Features
  • 1.1.3 Extraction of Albums
  • 1.1.4 Extraction of Playlists
  • 1.1.5 Extraction of Playlist and Artist Information

1.2 SpotisciencePredicter

  • 1.2.1 Initial Settings
  • 1.2.2 Prediction of Song Mood
  • 1.2.3 Prediction of Topics from Song Lyrics
  • 1.2.4 Prediction of Similar Songs

1. Spotiscience

Spotiscience is a project that I created on GitHub programmed in Python, in which you can interact with the Spotify API and Genius API to extract data and features of songs, albums, artists and playlists. You can also analyze this data to generate new information such as mood prediction, topic modeling, and mathematical distances to find similar songs. To download Spotiscience you can access the Github repository.

To understand the application and configuration of Spotiscience, I will detail the 2 main classes of this tool:

1.1 SpotiscienceDownloader

This class extracts the data from the Spotify API and Genius API.

1.1.1 Initial Settings

To use it, it must be set as follows:

import spotiscience

#create a dictionary with authorization keys
CREDENTIALS = {}
CREDENTIALS['client_id'] = "your_spotify_client_id"
CREDENTIALS['client_secret'] = "your_spotify_client_secret"
CREDENTIALS['redirect_url'] = "your_redirect_url"
CREDENTIALS['user_id'] = "your_spotify_user_id"
CREDENTIALS['genius_access_token'] = "your_genius_access_token"

"""You also can set your credentials id on credentials.py"""
# returns 'downloader class'
sd = spotiscience.SpotiScienceDownloader(credentials=CREDENTIALS)

To obtain the authorization credentials for the Spotify API and Genius API, you can watch the following tutorials:

Authentication Spotify API Tutorial

Authentication Genius API Tutorial

To obtain the “user_id” of your Spotify Account, you need to open the Desktop Spotify Application, go to “profile” and copy the link to profile as follows:

Photo by Author

You will obtain this result, your user_id is the part in bold, all the other parts of the link can be deleted.

“https://open.spotify.com/user/{USER_ID}?si=9f52cafadbf148b2”

1.1.2 Extraction of Song Features

To extract the features of a song, you should search the song on Spotify and then copy the link of the song as follows:

for this case I copied the link of the song “Blinding Lights” by The Weeknd:

song_copy_link = "https://open.spotify.com/track/0VjIjW4GlUZAMYd2vXMi3b?si=369f90167c9d48fb"song = sd.get_song_features(song_id=song_copy_link)

The result will be a dictionary with the following song features. To obtain more information about these features, you can read the official documentation about Audio Features on Web API Spotify

{'id': '0VjIjW4GlUZAMYd2vXMi3b',
'name': 'Blinding Lights',
'artist': 'The Weeknd',
'album': 'After Hours',
'release_date': '2020-03-20',
'popularity': 94,
'length': 200040,
'acousticness': 0.00146,
'danceability': 0.514,
'energy': 0.73,
'instrumentalness': 9.54e-05,
'liveness': 0.0897,
'valence': 0.334,
'loudness': -5.934,
'speechiness': 0.0598,
'tempo': 171.005,
'key': 1,
'time_signature': 4}

You also can extract the music genre and the lyrics of the song as follows:

# Returns song lyric
sd.get_song_lyrics(songname=song['name'],artistname=song['artist'])
#Returns song Genre
sd.get_song_music_genre(song_id=song['id'])

1.1.3 Extraction of Albums

To extract the features of the songs from a album, you must search the album or albums on Spotify and copy the link of the album. The album extraction method has an id parameter that receives a string or list of strings of the albums links and it’s necessary to specify the parameter is_artist in “False”:

#Returns songs features of album or albumsalbums =[
‘https://open.spotify.com/album/4yP0hdKOZPNshxUOjY0cZj?si=p5ItRNgXRlarmq4cihAVmA&dl_branch=1',
‘https://open.spotify.com/album/6Yf9kML4wT3opmXhTUIfk7?si=clKN-hzuTB236hINPATp-Q&dl_branch=1'
]
sd.get_albums_song_features(id=albums,is_artist=False)

The result will be a dictionary where the keys are the album’s name and the content corresponds to a list with all the features of album’s songs.

It’s also possible to download the discography of an artist, for this case the parameter id just receives a string and It’s necessary to specify is_artist in “True” as follows:

#Returns songs features of artist's discographyartist = 'https://open.spotify.com/artist/4fvA5FEHcdz97gscK90xZa?si=HNLNN7-dS5OR2W9TIUqQug&dl_branch=1'sd.get_albums_song_features(id=artist,is_artist=True)

1.1.4 Extraction of Playlists

Song features can be extracted from a playlist as well. For this case, the playlist_id parameter only receives a single string and the total number of songs to be extracted must be specified as follows:

#Return song features of playlistplaylist = ‘https://open.spotify.com/playlist/37i9dQZF1DXcfZ6moR6J0G?si=da1435f1a0804933'sd.get_playlist_song_features(playlist_id=playlist,n_songs=50)

The result will be a dictionary where the key is the name of the playlist and the content corresponds to a list with all the features of playlist songs.

1.1.5 Extraction of Playlist and Artist Information

Finally, the main information of a playlist and an artist can be extracted as follows:

playlist = ‘https://open.spotify.com/playlist/37i9dQZF1DXcfZ6moR6J0G?si=da1435f1a0804933'artist = 'metallica'#Returns playlist information
sd.get_playlist_information(playlist_id=playlist)
#Returns song information
sd.get_artist_information(artist=artist)

The results will be 2 dictionaries with the information of the playlist and the artist.

To have a better understanding of all the SpotiscienceDownloader methods, you can take a look to the source code of the module downloader.py in the GitHub repo by clicking here.

1.2 SpotiSciencePredicter

This class is for modeling song data using classification techniques for supervised learning, topic modeling with natural language processing, and song similarity with mathematical distances.

1.2.1 Initial Settings

For setting this class you only needs to call it as follows:

import spotiscience
# returns 'predicter class'
sp = spotiscience.SpotiSciencePredicter()

1.2.2 Prediction of Song Mood

To perform song mood prediction, I used a machine learning approach by tagging a group of songs from Mood Playlists created by Spotify, then I trained a model with Random Forest Classifier algorithm to tag songs based on their features.

For more information about this topic, you can read my article of Music Mood Prediction by clicking here

To predict the mood you just have to pass the data of the song extracted with SpotiscienceDownloader as follows:

#returns the tag of mood 
sp.predict_song_mood(song=song)

The result will be a string with the corresponding mood category, these categories are; “sad, calm, energy and happy”

1.2.3 Prediction of topics from song lyrics

The topic prediction of song lyrics uses any of the algorithms Latent Dirichlet Allocation Model (LDA), Non Negative Matrix Factorization Model (NMF) or Latent Semantic Indexing Model (LSI). To do this, I based my code on the following article which you can read here.

To predict the topic of lyrics you must configure the following parameters:

lyric = the lyric of the song

model = the model to use [options are “lsi”,”lda” or “nmf”]

lang = language of the song lyric [options are “english” or “spanish”]

n_grams = number of subsence of words to group

n_topics = number of returned topics

top_n = number of words per returned topic

For more information about the parameter n_grams, you can read the official documentation about vectorization with sklearn by clicking here

lyric = song_lyrics
model = 'lda' (available type 'lda', 'lsi', 'nmf')
lang = 'english' (available type 'english','spanish')
n_grams = (1,1)
n_topics = 1
top_n = 5
#predict the topics of the song lyric
sp.predict_topic_lyric(lyric,model,lang,n_grams,n_topics,top_n)

1.2.4 Prediction of Similar Songs

To predict the similarity of songs, I use the Manhattan Distance (l1) and Euclidean Distance (l2) to calculate the distance among song features and sorting the results in ascending.

to predict song similarity you must to configure the following parameters:

  • object = reference song to compare
  • target = group of songs to evaluate in albums, playlist or artist
  • distance = distance to use [options are “l1” and “l2”]
  • n_features = number of song features to calculate distance
  • top_n = number of songs to return in tuple results

For more information about the parameter n_features, you can read the source code of the method by clicking here.

Example 1: Predicting which songs of “Nu Metal Generation” Playlist are most similars to song “Change (In the House of Flies)” by “Deftones”.

playlist_link = "https://open.spotify.com/playlist/37i9dQZF1DXcfZ6moR6J0G?si=452e104160384c8e"song_link = "https://open.spotify.com/track/51c94ac31swyDQj9B3Lzs3?si=5aca903582464acd"target = sd.get_playlist_song_features(playlist_link,n_songs=70)
object = sd.get_song_features(song_link)
distance = 'l2'
n_features = 6
top_n = 10
#returns most similar songs from playlistsp.predict_similar_songs(object,target,distance,n_features,top_n)

Example 2: Predicting which songs of “Dua Lipa” Discography are most similars to the song “Blinding Lights” by “The Weeknd

artist_link = "https://open.spotify.com/artist/6M2wZ9GZgrQXHCFfjv46we?si=VJ3J-isZRbSM5x2pNUnrhw&dl_branch=1"song_link = "https://open.spotify.com/track/0VjIjW4GlUZAMYd2vXMi3b?si=369f90167c9d48fb"target = sd.get_albums_song_features(id=artist_link,is_artist=True)
object = sd.get_song_features(song_link)
distance = 'l2'
n_features = 6
top_n = 10
#returns most similar songs from playlistsp.predict_similar_songs(object,target,distance,n_features,top_n)

The result in both examples is a dictionary where the key is the name of the song of reference (object) and the content is a list of tuples. Each tuple is a pair value of the name of the song, and its distance with the reference song (object).

Note: It’s also possible to predict similar songs in albums without having to download the entire discography of the artist. To do this you can use the album features on the target parameter.

2. Conclusion

Mixing 2 different areas like data science and music can generate great options that serve to understand how music develops in a culture in constant change, where the sound of an instrument, a poetry lyric, and vocal skills can be interpreted in so many different ways. With the help of technology, we seek to achieve an approximation of these interpretations and meanings to study what is that invisible energy that makes music stay with us throughout the course of our lives. I hope Spotiscience can be one of those technologies that helps Data Scientists, Developers and Music Lovers like me.

3. References

My Other Articles

--

--