AI for artists: Part 2

Savio Rajan
Towards Data Science
6 min readJun 7, 2018

--

Quote from Westworld TV show

Note that this article is part of the series AI for artists . Part 1 , Part 2

Music is a powerful tool that has made some of the most brilliant minds in the world turn into a state of wonder . Among them was Friedrich Nietzsche , Schopenhauer , Virginia Woolf and the list goes on. Nietzche in his book , Twilight of the Idols said that “ Without music life would be a mistake“ .

In this article we will create music using simple LSTM network but before that let’s get a brief idea about algorithmic composition which has occurred in the history of music composition.

A brief history through time ……….

There are numerous treatises on music theory dating from Greek antiquity but they were not “algorithmic composition” in any pure sense. But one of the most often cited examples of algorithmic music in the Classical Period is Musikalisches Würfelspiel by Mozart .

Combinations obtained by rolling a dice.

In 1958 , Iannis Xenakis used Markov Chains in his composition.

Markov chains

A Markov chain is used to describe a set possible events in which the probability of occurrence of each event depends only on the state attained in the previous event.These are used to predict the next data in the sequence where data can be words,musical notes etc . It basically models probability of occurrence of a note after a sequence of notes was played.One of the main limitation of Markov chains was that it can only produce subsequences which are present in the original data. But to extend beyond those exact subsequences was not possible with this method . Then came recurrent neural networks (RNNs) attempt to do so .

What is RNN ?

Let’s understand this by an analogy. Suppose you want to predict what will be the food for dinner.The food includes only three things — Chapati, fried rice and bread. We try to fit a model to predict what will be the food on any day based on factors like cook was late ,special occasions , decline in prices of chicken. If we start to train this on history of dinner data , we learn to predict what will the food for a particular night. We can see that the accuracy of prediction won’t increase beyond a certain point even if we choose the inputs and train the data properly . Now when we take a look at the data we could see foods — Fried rice , Chapati and Bread is being repeated in a cycle and doesn’t depend on any factors. If on previous day it was Chapati then on next day it would be bread and it goes on like a sequence. Here the most important thing for prediction of food for a particular day is the food data of previous day. By collecting previous food data we can predict the next data.

source : http://colah.github.io/posts/2015-08-Understanding-LSTMs/

In normal neural networks, all the inputs are independent of each other , but in RNN, all the inputs are related to each other. First, it takes the x(0) from the sequence of input and then it outputs h(0) which together with x(1) is the input for the next step. So, the h(0) and x(1) is the input for the next step. Similarly, h(1) from the next is the input with x(2) for the next step and so on. This way, it keeps remembering the context while training. Suppose we have to generate the next note in a given sequence of music , in that case, the relation among all the previous notes helps in predicting a better output. But the problem is that standard RNN cannot learn long term sequences due to vanishing gradient problem.

Vanishing gradient problem

This happens when we we keep on adding more and more hidden layers and during back propagation ,as a result the errors(derivatives) can get very large or small.This means that the neurons in the earlier layers learn very slowly as compared to the neurons in the later layers in the hierarchy .The problem with this is that the training process takes too long and the prediction accuracy of the model will decrease.One of the solution to this is to use an LSTM network. In 2002 Doug Eck changed this approach by switching from standard RNN cells to “long short term memory” (LSTM) cells.

LSTM networks

Long Short Term Memory networks(LSTM) is a kind of RNN, which are capable of learning long-term dependencies which means they can remember information for long periods of time . In normal RNNs, this repeating module will have a simple structure like a single tanh layer.

Simple RNN (source:colah’s blog)

LSTMs have chain like structures and the repeating module has a different structure. Instead of having a single neural network layer, there are four networks.

LSTM

Instead of neurons, LSTM networks have memory blocks that are connected into layers. A block contains gates that manage the block’s state and output.

cell state highlighted

The key to LSTMs is the cell state, the horizontal line running through the top of the diagram. It runs straight down the entire chain, with only some minor linear interactions.

LSTM gate

Gates are a way to decide whether to let information through or not .The sigmoid layer outputs numbers between 0 and 1 .

0 →nothing passes through

1 →everything passes through

An LSTM has three of these gates, to protect and control the cell state.

Forget Gate → decides what information to discard from the unit.
Input Gate→ decides which values from the input to update the memory state.
Output Gate→ decides what to output based on input and the memory of the unit.

There are many ways to generate music using different networks ranging from Markov chains to Convolutional Neural networks( Wavenet ). Here I chose LSTM network to generate music . For training data , we use Nottingham Music Database. I used a pretrained model created by Panagiotis Petridis . The dataset is in ABC format. You can also train your own model if you have a good gpu or cpu .The code is provided here as jupyter notebook . If you want explanations for the code go to this link.

Five text files will be generated as output . You can convert the generated text files into midi by using software like abcmidi . You can also use various projects like magenta , wavenet, deepjazz etc for music generation.

My output

If you are a beginner and want to get started into the field of deep learning , visit my blog Art of Machine Learning .

Thank you for your time !!

Reference :

https://www.youtube.com/watch?v=qhXZsFVxGKo

--

--