Sound classification using Images, fastai

Dipam Vasani
Towards Data Science
5 min readMay 14, 2019

--

Introduction

This week I read about a really cool application of deep learning. Classifying audio files using images. These images are known as Spectrograms.

A Spectrogram is a visual representation of the frequencies of a signal as it varies with time. Now, sound classification or audio tagging have various applications. However, one really interesting application was developed by a lady called Sarah Hooker.

She started a non-profit called Delta Analytics, and together they helped build a system where they attached old mobile phones to trees in Kanyan rain forest and used it to listen to chainsaw noises. They then used deep learning to identify when a chainsaw was being used. This system would alert the rangers who would in turn stop illegal deforestation in the rain forests.

Data set

For this article I have used the UrbanSound8k dataset. It contains 8732 labeled sound excerpts of urban sounds from 10 classes. All the audio files are ≤4s which makes it easier to create spectrograms since longer audio files would require cropping and overlapping. Let’s see what the classes are:

We can listen to one of the audio files in Jupyter by doing:

Before we convert our audio files to spectrograms we need to check how these files are stored. In the data set that we are using, the audio files are spread out across folders named fold1 to fold11. The way we are expected to use the data is, suppose we are using fold1 . Then fold2 to fold11 will consist of our training set and fold1 will be our validation set. In this way, we are required to use all the folders, one by one, as our validation set while the rest of the folders make our training data. Our final accuracy will be the average of all these. (K fold cross validation)

Code

We have 2 options to convert the audio files to spectrograms, matplotlib or librosa. We will go for the latter because it is easier to use and well known in the sound domain. Before we use it we just need to install a little dependency to ensure librosa works well.

We can now transform our audio files into spectrograms. Now, I did not write this code myself, I just used someone else’s code. And it is okay to do so, as long as you understand what the code does.

The good thing about Python is that, merely looking at the code more often than not gives you a fair idea about what’s going on. My first impression looking at this code was we are opening a sound file, initializing a plot, converting the sound file to spectrogram and finally saving the plot. We can always dig into the documentation if we want to understand the code in detail.

Since we are creating plots and storing them, it is going to take a lot of time to run. Hence I chose to do it for just one fold instead of all the folds.

Once we have all the spectrograms, we can treat it as an image classification problem and follow standard procedure. One thing to note about spectrograms is that it is really difficult to convert them back to audio without losing significant data.

Classification

We start by creating a data bunch using the datablock api.

Note that we do not apply any transforms to our data. This is because spectrograms will always be generated the same way regardless of the conditions unlike clicking an image which may vary based on lighting or other factors. There are, however, some audio transforms one can apply (more on this in another article).

Let’s take a look at some of the spectrograms:

We can now use transfer learning to train our neural net into classifying these images.

We train for a few epochs.

Unfreeze the model, find the learning rate, and train some more.

Once we are happy with our model we can save the weights.

Because of a Kaggle error related to the number of output files generated, I haven’t been able to commit my kernel. I will post a link to it as soon as I can resolve the error.

Reference notebook.

That will be it for this article.

If you want to learn more about deep learning check out my series of articles on the same:

~happy learning.

--

--