Photo by Raphael Lovaski on Unsplash

Song lyrics generation with Artificial Intelligence (RNN)

Here’s how to use Artificial Intelligence to write a song that doesn’t exist.

Towards Data Science
6 min readDec 29, 2020

--

Note: The entire code can be found (and loved) here

I’m a musician, and a data scientist.

I spend my days writing codes and studying statistical theorems. Then (when it gets dark outside) I like to write music.

But it is possible to write music while coding?

The answer is yes, and I’m about to show you how.

Schedule:

Note: If you are interested on the entire process, I want to marry you. If this is not the case, you can skip to the 4., 5. and 6. points to see the algorithm and its results.

  1. The theory (a small overview)
  2. The Dataset
  3. Data Pre-processing
  4. Song-writing RNN
  5. Results
  6. Thinking out loud

1. The theory (a brief overview)

Note: If you know enough about Machine Learning and Recurrent Neural Networks (and you are just interested in the code), please skip this part.

Let’s try to give a brief (non technical) overview of the theory that is behind this project.

Saying Artificial Intelligence is saying so many things, and it collapses into saying almost nothing. With this term we mean generally the entire set of techniques that are used to build some form of ability that we can in some way define “intelligent” (it is unclear, right?). Let’s try to be more precise.

In this particular application of this vague term that is Artificial Intelligence, I’ve used Recurrent Neural Networks. What are these beasts?

Let’s take a step-back. Neural Networks are Machine Learning algorithms that learn stuff in a layered way, in a similar way of the human learning (from simple concepts to more complex ones).

Recurrent Neural Networks are specific kinds of Neural Networks that process the data that are meant to be looked in their entire sequence. Let’s say you want to predict the highest temperature you will get tomorrow. A way to do that is to use the Recurrent Neural Networks. This example may seems trivial, but it is actually the same thing we want to do here:

From a given sequence as an input, predict the next word, then the next word, then the next word…

Let’s play. :)

GIF from The Kennedy Center

2. The Dataset

The dataset I’ve used is a courtesy of Manva Pradhan and you can find it here

Yes, I’ve picked Taylor Swift choruses to train my data. And it is not (just) because she melts my heart every time she sings.

GIF from Giphy

These methods work extremely well if you use a lot of data to train your model (you may have encountered the term “Big Data”). The drawback is that you require a lot of computational power to have a decent result out of a lot of data. So I’ve used a single singer and based my model on that.

But why did I use Taylor Swift?

I’ve done that because she is actually “easy to get” when it is about choruses. She doesn’t use solemn terms and she doesn’t use over-sophisticated metrics. And that’s about it. You could use Ed Sheeran, or Justin Bieber, or someone else (the best thing would be actually to use them all together to create a powerful model).

3. Data pre-processing

Let’s give a look at the dataset:

So you have the entire lyrics with this line_number for each songs. But we want to write the new choruses, so we eventually have to take the choruses only (we will do that, keep calm). In all the datasets I’ve worked, I’ve always found something strange that messes your model up. Unfortunately, this is not an exception.

The same album appears multiple times, but with different names, and it is actually a problem. Fixing this with this few lines:

Ok, we’re cool. Now, if you look at the starting point of each verse, chorus, or bridge you could find this notation : [Verse], [Chorus], [Bridge] (actually you find it every where, it is like super-basic).

So let’s have another column that select the lines that contains that ‘[‘ stuff (1/0).

Awesome, now we just have to pick the Choruses. We move in these ‘[‘ values that are specifically the ones of the Choruses (remember that you have stored those in that IND list), and we stop when we find another ‘]’.

Again, let’s clean some mess here:

Here:

And here:

With this line:

And the die is cast.

This is an example:

GIF from Ash vs Evil Dead

4. Song-writing RNN

These models are complex to build, and unless you are a researcher, you’ll never build a Neural Network from scratch. Here’ s the Recurrent Neural Network I’ve used .

The first thing you do is not immediate to comprehend, as it is pretty technical. It regards a series of techniques that are used in order to make strings “readable” as numbers.

It is not so interesting to deepen it here, but here’s the TensorFlow commented code:

Then you have the interesting part. Words are seen as vectors that needs to be computed in the best way as possible to capture the meaning of the word itself (this method is called embedding). Then, you use the Gated Recurrent Units, that are cells that are able to “remember” a certain number of previous words in a clever way. Finally, you use a dense layer with the logit that gives you an information about the most probable word you expect. Isn’t that awesome?

Graph developed by Tensorflow

Of course, these methods are “magical but not magic”. So they need to be trained, for a pretty long period of time. Specifically, they are trained to minimize a certain loss you have to attach to your optimizer:

Trained model right here:

And this is the last step:

So the input is:

  • The trained model
  • The start string (remember: the model is “recurrent”)
  • The temperature.

This last input is actually amazing. In fact if you use low temperature, you will get predictable results, if you increase the temperature, your lyrics will become more “creative”. You don’t believe me, right? You will. :)

5. Results

You would probably be thinking: “Hey man, this is enough. Give me your lyrics”.

You’re right bad boy/ girl. Here’s three example, with different values of temperature and different inputs:

As I’ve told you, if you increase the temperature you risk to have nonsense lyrics like “Say a mind of my friends are saying”. On the other hand, low temperature takes you to existing lyrics, so you have to be careful and adapt the temperature and the start string.

If you want to be more technical, you could use LSTM cell instead of GRU, or use a more powerful machine, or change the data pre processing part.

6. Thinking out loud

We are skeptical about “AI writing songs”, and there is a reason why we are. We like to think that Music, Art, Poetry, Cinema doesn’t regard numbers, equation, computers, but belongs to a different part of ourselves, that is the creative and passionate one.

As a musician and data scientist, I’m really confused. I would like to think that when I listen to my favourite album and I get goosebumps it is because there is something more about the music that is not just a good mix of sounds and words that are accurately predicted by a logit function. But isn’t it Artificial Intelligence a form of art by itself? Does this “art” actually exist? Do these feelings actually exist? Well, I do have feelings for Taylor Swift though.

If you liked the article and you want to know more about Machine Learning, or you just want to ask me something you can:

A. Follow me on Linkedin, where I publish all my stories
B. Subscribe to my newsletter. It will keep you updated about new stories and give you the chance to text me to receive all the corrections or doubts you may have.
C. Become a referred member, so you won’t have any “maximum number of stories for the month” and you can read whatever I (and thousands of other Machine Learning and Data Science top writer) write about the newest technology available.

Thank you :)

--

--

PhD in Aerospace Engineering at the University of Cincinnati. Machine Learning Engineer @ Gen Nine, Martial Artist, Coffee Drinker, from Italy.