The world’s leading publication for data science, AI, and ML professionals.

Practical Cython- Music Retrieval: Short Time Fourier Transform

Do you want to know more about Cython? Follow me in this series and I will show you practical examples of C-Cython-Python implementations

Spectrogram of "Compression of Time", Final Fantasy 8 https://www.youtube.com/watch?v=l75zjpDpAWs
Spectrogram of "Compression of Time", Final Fantasy 8 https://www.youtube.com/watch?v=l75zjpDpAWs

Join Medium with my referral link – Stefano Bosisio

I love so much Cython, as it takes the best of the two main programming worlds: C and Python. Both languages can be combined together in a straightforward way, in order to give you more computational efficient APIs or scripts. Furthermore, coding in Cython and C helps you to understand what is underneath common python packages such as sklearn, bringing data scientists to a further step, which is quite far from the simple import torch and predefined algorithms usage.

In this series I am going to show you how to implement in C and Cyhton algorithms to analyse music, following as a reference the corresponding sklearn and scipypackages.

This lesson deals with the Short-time Fourier Transform or STFT. This algorithm is widely used to analyse frequencies of a signal and their evolution in time. The codes are stored in this repository:

Steboss/music_retrieval

The codes are extremely useful to understand how to structure a Cython project, subdividing codes in folders and install the final package.

This post contains external link to Amazon affiliate program.


Layman theory

Feel free to skip this section if you want immediately to get your hands dirty with the code. This is just a light introduction to STFT and its theory.

The main aspect of the Fourier Transform is to map (or let’s say to sketch) a signal into a frequency domain, pointing out the most important frequencies which constitutes the signal itself. This mapping has wide implications in many fields: in biomedical engineering (e.g. studying frequency contributions in electrocardiogram (ECG) to detect possible diseases or heart malfunctions¹ ² ³), computational science (e.g. compression algorithms such as mp3, jpeg) or finance (e.g. studying stock prices, bond prices behaviours). This mapping is beneficial for studying Music signals as well, as the main frequency content can be retrieved and analysed, for example, to create a genres classifiers or app like Shazam (e.g. check my post ). However, it is sometimes interesting and helpful to understand the frequency evolution in time and amplitude, in order to find specific noises or to equalise frequencies in a recording session, or to create neural network algorithms to convert a speech signal to text (e.g. DeepPavlov).

In practice, STFT divides a time singal into short segments of equal length ( window_length ) and then the Fourier transform of each segment is computed. The resulting segment-frequency content can be plotted against time and it is called spectrogram.

Practically, the STFT can be summarised in these steps:

  • Take an input signal ( e.g. mp3 file)
  • Multiply the signal by a window function (e.g. Hamming function). This will help the Fourier transform to be computed at the extremes of each segment, in order to avoid possible discontinuities in the signal, which may block the Fourier Transform computation
  • Slide with a window and a hop-size window along the singal/time and compute the Fourier transform

Fig. 1 helps to better understand what STFT does. An input signal with a defined amplitude, in decibel, and time, in seconds, is encapsulated in N windows of size windowSize. Every HopSize, the window define a signal-segment, which is Fourier transformed. The output frequency, in Hertz, can be plotted as a function of time.

Fig. 1: STFT in a nutshell. A window size and hop size are defined. For each window-segment the Fourier Transform is computed. Frequency content can then be displayed as a function of time
Fig. 1: STFT in a nutshell. A window size and hop size are defined. For each window-segment the Fourier Transform is computed. Frequency content can then be displayed as a function of time

The last thing to remember is the Nyquist frequency. If you look at the spectrogram, you will find out that the maximum plotted frequency is half of the sampling frequency of the signal, in order to avoid an issue known as aliasing in the Fourier Transform. This means that out of the N complex Fourier coefficients retrieved from a signal, with sampling frequency fs (e.g. an audio file have usually a sampling frequency of 44100 Hz) only half of these are useful, representing frequencies from 0 to fs/2

For more info I definitely recommend these very valuable and useful books:


STFT in Cython: Let’s have fun

Plan of Action

Fig. 2: Schematic implementation from Python to Cython and C
Fig. 2: Schematic implementation from Python to Cython and C

Figure 2 shows a plan of action to implement STFT in Cython and C. The entire process can be divided in three chunks, which makes use respectively of Python, Cython and C:

  1. The Python interface deals with input/output and manipulation of the signal:
  • The mp3/wav input file is opened with scipy or pydub
  • Left and right channels are isolated and only channel[:,0] is analysed
  • Padding is perform on the signal, so the total number of elements is a power of 2, which improves the performance of the Fourier transform library fftw
  1. The Cython interface translates the Pythonic inputs to memoryviews, which can then be easily passed as pointers to the C suite:
  • To have a concreate idea, fig.3 shows an example for creating a memoryview in Cython from an array of zeros, np.zeros of length n_elements

    1. The C interface performs the core STFT operations. In a nutshell:
  • Define the fftw3 library domain with the fftw elements and plan. The plan is necessary to perform the Fast Fourier Transform:

  • Create a Hamming window
  • Perform the STFT for loop: multiply input chunk by Hamming window, compute the Fast Fourier transform on the output, store half of this result.

It is very important to highlight that the current structure makes sure the most computational intense operations are performed in the C code and all the info are then returned back to Python. This allows to lower the computational cost – as many algorithms are Python implemented only and not in C – and improve the overall Python code performance.

Python Script

Code: https://github.com/Steboss/music_retrieval/blob/master/stft/installer/tester.py

The Python script handles input audio files and prepares all the info for the STFT to be performed.

Firstly,pydub can be used to open and read an mp3 file, while scipy offers a built-in function to read wav extensions:

Then we can define all the "constants" in our code and call the STFT:

In fig. 6, the windowSize and hopSize are defined. As shown in fig.1 these two quantities can overlap. In general, more overlap gives more analysis points and smoother results across time, but the price to pay is a higher computational cost. Overall for this tutorial we can attain to the same size, but feel free to experiment as much as possible.

Cython code

Code: https://github.com/Steboss/music_retrieval/blob/master/stft/installer/stft.pyx

Cython codes usually have two extensions: pyx and pxd . The former is where the main code is written, while the second one if used for declarations. In this tutorial we are going to use just the pyx extension, in order to get more familiar with Cython and have to deal with just one piece of code.

The first step is to import our Ccode in Cython – later we will see how the c code is structured:

C code is imported using cdef extern from YOUR_C_CODE: declaration. Following, the name of the C function, that we want to use, has to be declared, along with all the types, exactly as if we were in a C code. Thus, the function stft returns an array as a pointer, so double* for stft C-function. The arguments are the input audio channel double *wav_data , the number of samples within the final STFT signal int samples, the window and hop size int windowSize, int hop_Size , the sampling frequency and the length of the audio channel int sample_freq, int length . samples can be computed as:

samples = int((length/windowSize/2))*((windowSize/2)+1))

The next step is to create and declare our main function., which will be called in our python script (whose name here is play ):

cpdef play(audData, rate, windowSize, hopSize):

cpdef declares that the following function will contain both C and Python code and types. Further options are def as per normal Python, in case we want to call directly a Python function and cdef , which is used to declare a pure C function.

The second important concept is to pass all the arguments to the main stft C function, as shown in fig.8

Basically, memoryviews can be easily passed to a C function and they will be treated as 1D arrays. First, to create a memoryview it is necessary to define the type and the size. Lines 2 and 6 of fig. 8 shows how to create 1D memory view: cdef double[:] name_of_memory_view = content where [:] defines a 1D memory view. Finally, memoryviews can be passed to a C function as &name_of_memory_view[0] as done in line 8 for &audData_view[0] and &magnitude[0] . In the next tutorials I will show you in more detail how to deal with 2D memory views and pass them to C, as this has to deal with the memory-buffer. It is worth to notice that in line 8 &magnitude[0] is passed to C as a vector of 0s. This allows to deal with an already initialized pointer in C and fill them with the STFT values.

C code

Code: https://github.com/Steboss/music_retrieval/blob/master/stft/c_code/stft.c

The first step is to import the necessary libraries:

fftw3 is the most important library, which allows to compute fast Fourier transform, as well as cosine transform and so on.

Secondly, we need to define some constant, such as pi value and the Hamming window function, to create Hamming window values. This can be easily done in this way:

At this point, we can start playing with fftw3 . As fig.11 shows, we need to create a fftw_complex array for the input data stft_data whose size is windowSize ; an output array fft_result with the same dimensions of stft_data to store the Fourier transform windowed signal and finally a storage array to store all the transformed windows together, whose size is samples . To allow fftw to compute a Fourier transform on the windowed input signal we need a Fourier plan object which is create on line 17 in this way:

fftw_plan plan_forward;
plan_forward = fftw_plan_dft_1d(number_of_elements_in_output, 
                                input_data_to_be_transformed, 
                                output_transformed_data, 
                                flag_for_forward_transform, 
                                flag_for_inverse_transform);

The last step is to implement the real STFT algorithm:

Fig. 12 helps you to understand how the STFT is implemented. First of all, the stft_data array is filled with the current window-sized signal amplitude values (line 10). Then this segment is Fourier transformed, calling the fftw_execute(plan_forward) . Indeed, fftw_execute will trigger the execution of the Fourier Transform plan, which was defined above, in order to transform the current input and save the result in the output array fft_result . Finally, half of the Fourier samples are saved in the storage array, which will be returned to Cython later on, and the chunkPosition is updated by hop_size/2 so it is possible to proceed to the next window – do you remember Nyquist frequency?

At this point it is possible to fill the magnitude array, which was a memoryview created in Cython, with the Fourier transformed signals from storage

The values in magnitude will be thus transferred from C to Cython. In Cython it is then possible to manipulate the magnitude array to return a numpy array, that can be read in Python:

Prepare setup.py and install everything

This is the very last step before having fun in analysing STFT. To install the create Python package in Cython I usually divide my codes in two folder: c_code , where stft.c is stored and installer , where I save pyx and Python files. In the installer folder I create setup.py file, which is a Python file to install and compile Cython and C codes. The code is pretty straightforward:

In the code, fig. 15, we can see that the main elements to compile codes are Extension and setup . The former define the name of the current extension stft.stft , the list of pyx files [stft.pyx] , the list of libraries and extra argument needed to compile the C code . In particular, in this context we need to specifiy fftw3and m as additional libraries , where one declares the usage of the Fast Fourier Transform library and the second the math library (when a C code is compiled usually m is added as -lm and fftw3 as-lfftw3 ). The extra arguments make sure we have an O3 optimization process in compiling C, while std=C99 is to inform the compiler that we are using a standard version C99. Finally, setup stores the package information, namely the name of the final module – so Python imports the current modules as import stft – the version control, the extensions for C and any directive for Cython. As regards Cython directives I recommend you to give a look at the guide.

Once the code is ready we can run setup.py as:

python setup.py build_ext --inplace

build_ext is the command to build the current extension and --inplace allows to install the current package in the working directory.


Funny example with STFT

Code:https://github.com/Steboss/music_retrieval/tree/master/stft/examples

A good example for studying STFT is Aphex Twin "Equation" music

If we analyse the spectrogram between 300 and 350 s:

start = 300*rate 
end   = 350*rate 
channel1 = audData[:,0][start:end]

setting windowSize=4096 and hopSize=4096 a demoniac face will be uncovered within the music spectrogram:

FIg. 16: hidden scary face (between 30–40 s) in "Equation" by Aphex Twin
FIg. 16: hidden scary face (between 30–40 s) in "Equation" by Aphex Twin

I hope you like this first introduction to Cython 🙂 feel free to send me an email for questions or comments at: [email protected]


Bibliography

  1. https://onlinelibrary.wiley.com/doi/full/10.1002/jmri.1160
  2. https://www.sciencedirect.com/science/article/abs/pii/S0010482516302104
  3. https://www.scirp.org/html/4708.html
  4. https://www.roe.ac.uk/japwww/teaching/fourier/compression.html
  5. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.307.8732&rep=rep1&type=pdf

Related Articles