
I love so much Cython, as it takes the best of the two main programming worlds: C and Python. Both languages can be combined together in a straightforward way, in order to give you more computational efficient APIs or scripts. Furthermore, coding in Cython and C helps you to understand what is underneath common python packages such as sklearn
, bringing data scientists to a further step, which is quite far from the simple import torch
and predefined algorithms usage.
In this series I am going to show you how to implement in C and Cyhton algorithms to analyse music, following as a reference the corresponding sklearn
and scipy
packages.
This lesson deals with the Short-time Fourier Transform or STFT. This algorithm is widely used to analyse frequencies of a signal and their evolution in time. The codes are stored in this repository:
The codes are extremely useful to understand how to structure a Cython project, subdividing codes in folders and install the final package.
This post contains external link to Amazon affiliate program.
Layman theory
Feel free to skip this section if you want immediately to get your hands dirty with the code. This is just a light introduction to STFT and its theory.
The main aspect of the Fourier Transform is to map (or let’s say to sketch) a signal into a frequency domain, pointing out the most important frequencies which constitutes the signal itself. This mapping has wide implications in many fields: in biomedical engineering (e.g. studying frequency contributions in electrocardiogram (ECG) to detect possible diseases or heart malfunctions¹ ² ³), computational science (e.g. compression algorithms such as mp3, jpeg
⁴) or finance (e.g. studying stock prices, bond prices behaviours⁵). This mapping is beneficial for studying Music signals as well, as the main frequency content can be retrieved and analysed, for example, to create a genres classifiers or app like Shazam (e.g. check my post ). However, it is sometimes interesting and helpful to understand the frequency evolution in time and amplitude, in order to find specific noises or to equalise frequencies in a recording session, or to create neural network algorithms to convert a speech signal to text (e.g. DeepPavlov).
In practice, STFT divides a time singal into short segments of equal length ( window_length
) and then the Fourier transform of each segment is computed. The resulting segment-frequency content can be plotted against time and it is called spectrogram.
Practically, the STFT can be summarised in these steps:
- Take an input signal ( e.g. mp3 file)
- Multiply the signal by a window function (e.g. Hamming function). This will help the Fourier transform to be computed at the extremes of each segment, in order to avoid possible discontinuities in the signal, which may block the Fourier Transform computation
- Slide with a window and a hop-size window along the singal/time and compute the Fourier transform
Fig. 1 helps to better understand what STFT does. An input signal with a defined amplitude, in decibel, and time, in seconds, is encapsulated in N windows of size windowSize. Every HopSize, the window define a signal-segment, which is Fourier transformed. The output frequency, in Hertz, can be plotted as a function of time.

The last thing to remember is the Nyquist frequency. If you look at the spectrogram, you will find out that the maximum plotted frequency is half of the sampling frequency of the signal, in order to avoid an issue known as aliasing in the Fourier Transform. This means that out of the N complex Fourier coefficients retrieved from a signal, with sampling frequency fs (e.g. an audio file have usually a sampling frequency of 44100 Hz) only half of these are useful, representing frequencies from 0 to fs/2
For more info I definitely recommend these very valuable and useful books:
- Physics of Oscillations and Waves: With use of Matlab and Python (Undergraduate Texts in Physics)
- An Introduction to Information Theory, Symbols, Signals and Noise (Dover Books on Mathematics)
- Fourier Transforms (Dover Books on Mathematics)
STFT in Cython: Let’s have fun
Plan of Action

Figure 2 shows a plan of action to implement STFT in Cython and C. The entire process can be divided in three chunks, which makes use respectively of Python, Cython and C:
- The Python interface deals with input/output and manipulation of the signal:
- The mp3/wav input file is opened with
scipy
orpydub
- Left and right channels are isolated and only
channel[:,0]
is analysed - Padding is perform on the signal, so the total number of elements is a power of 2, which improves the performance of the Fourier transform library
fftw
- The Cython interface translates the Pythonic inputs to memoryviews, which can then be easily passed as pointers to the C suite:
-
To have a concreate idea, fig.3 shows an example for creating a memoryview in Cython from an array of zeros,
np.zeros
of lengthn_elements
- The C interface performs the core STFT operations. In a nutshell:
-
Define the
fftw3
library domain with thefftw
elements and plan. The plan is necessary to perform the Fast Fourier Transform: - Create a Hamming window
- Perform the STFT for loop: multiply input chunk by Hamming window, compute the Fast Fourier transform on the output, store half of this result.
It is very important to highlight that the current structure makes sure the most computational intense operations are performed in the C code and all the info are then returned back to Python. This allows to lower the computational cost – as many algorithms are Python implemented only and not in C – and improve the overall Python code performance.
Python Script
Code: https://github.com/Steboss/music_retrieval/blob/master/stft/installer/tester.py
The Python script handles input audio files and prepares all the info for the STFT to be performed.
Firstly,pydub
can be used to open and read an mp3 file, while scipy
offers a built-in function to read wav extensions:
Then we can define all the "constants" in our code and call the STFT:
In fig. 6, the windowSize
and hopSize
are defined. As shown in fig.1 these two quantities can overlap. In general, more overlap gives more analysis points and smoother results across time, but the price to pay is a higher computational cost. Overall for this tutorial we can attain to the same size, but feel free to experiment as much as possible.
Cython code
Code: https://github.com/Steboss/music_retrieval/blob/master/stft/installer/stft.pyx
Cython codes usually have two extensions: pyx
and pxd
. The former is where the main code is written, while the second one if used for declarations. In this tutorial we are going to use just the pyx
extension, in order to get more familiar with Cython and have to deal with just one piece of code.
The first step is to import our Ccode in Cython – later we will see how the c code is structured:
C code is imported using cdef extern from YOUR_C_CODE:
declaration. Following, the name of the C function, that we want to use, has to be declared, along with all the types, exactly as if we were in a C code. Thus, the function stft
returns an array as a pointer, so double*
for stft
C-function. The arguments are the input audio channel double *wav_data
, the number of samples within the final STFT signal int samples
, the window and hop size int windowSize, int hop_Size
, the sampling frequency and the length of the audio channel int sample_freq, int length
. samples
can be computed as:
samples = int((length/windowSize/2))*((windowSize/2)+1))
The next step is to create and declare our main function., which will be called in our python script (whose name here is play
):
cpdef play(audData, rate, windowSize, hopSize):
cpdef
declares that the following function will contain both C and Python code and types. Further options are def
as per normal Python, in case we want to call directly a Python function and cdef
, which is used to declare a pure C function.
The second important concept is to pass all the arguments to the main stft
C function, as shown in fig.8
Basically, memoryviews can be easily passed to a C function and they will be treated as 1D arrays. First, to create a memoryview it is necessary to define the type and the size. Lines 2 and 6 of fig. 8 shows how to create 1D memory view: cdef double[:] name_of_memory_view = content
where [:]
defines a 1D memory view. Finally, memoryviews can be passed to a C function as &name_of_memory_view[0]
as done in line 8 for &audData_view[0]
and &magnitude[0]
. In the next tutorials I will show you in more detail how to deal with 2D memory views and pass them to C, as this has to deal with the memory-buffer. It is worth to notice that in line 8 &magnitude[0]
is passed to C as a vector of 0s. This allows to deal with an already initialized pointer in C and fill them with the STFT values.
C code
Code: https://github.com/Steboss/music_retrieval/blob/master/stft/c_code/stft.c
The first step is to import the necessary libraries:
fftw3
is the most important library, which allows to compute fast Fourier transform, as well as cosine transform and so on.
Secondly, we need to define some constant, such as pi value and the Hamming window function, to create Hamming window values. This can be easily done in this way:
At this point, we can start playing with fftw3
. As fig.11 shows, we need to create a fftw_complex
array for the input data stft_data
whose size is windowSize
; an output array fft_result
with the same dimensions of stft_data
to store the Fourier transform windowed signal and finally a storage
array to store all the transformed windows together, whose size is samples
. To allow fftw
to compute a Fourier transform on the windowed input signal we need a Fourier plan object which is create on line 17 in this way:
fftw_plan plan_forward;
plan_forward = fftw_plan_dft_1d(number_of_elements_in_output,
input_data_to_be_transformed,
output_transformed_data,
flag_for_forward_transform,
flag_for_inverse_transform);
The last step is to implement the real STFT algorithm:
Fig. 12 helps you to understand how the STFT is implemented. First of all, the stft_data
array is filled with the current window-sized signal amplitude values (line 10). Then this segment is Fourier transformed, calling the fftw_execute(plan_forward)
. Indeed, fftw_execute
will trigger the execution of the Fourier Transform plan, which was defined above, in order to transform the current input and save the result in the output array fft_result
. Finally, half of the Fourier samples are saved in the storage
array, which will be returned to Cython later on, and the chunkPosition
is updated by hop_size/2
so it is possible to proceed to the next window – do you remember Nyquist frequency?
At this point it is possible to fill the magnitude
array, which was a memoryview created in Cython, with the Fourier transformed signals from storage
The values in magnitude will be thus transferred from C to Cython. In Cython it is then possible to manipulate the magnitude
array to return a numpy array, that can be read in Python:
Prepare setup.py and install everything
This is the very last step before having fun in analysing STFT. To install the create Python package in Cython I usually divide my codes in two folder: c_code
, where stft.c
is stored and installer
, where I save pyx
and Python files. In the installer
folder I create setup.py
file, which is a Python file to install and compile Cython and C codes. The code is pretty straightforward:
In the code, fig. 15, we can see that the main elements to compile codes are Extension
and setup
. The former define the name of the current extension stft.stft
, the list of pyx
files [stft.pyx]
, the list of libraries and extra argument needed to compile the C code . In particular, in this context we need to specifiy fftw3
and m
as additional libraries , where one declares the usage of the Fast Fourier Transform library and the second the math library (when a C code is compiled usually m
is added as -lm
and fftw3
as-lfftw3
). The extra arguments make sure we have an O3
optimization process in compiling C, while std=C99
is to inform the compiler that we are using a standard version C99. Finally, setup
stores the package information, namely the name of the final module – so Python imports the current modules as import stft
– the version control, the extensions for C and any directive for Cython. As regards Cython directives I recommend you to give a look at the guide.
Once the code is ready we can run setup.py
as:
python setup.py build_ext --inplace
build_ext
is the command to build the current extension and --inplace
allows to install the current package in the working directory.
Funny example with STFT
Code:https://github.com/Steboss/music_retrieval/tree/master/stft/examples
A good example for studying STFT is Aphex Twin "Equation" music
If we analyse the spectrogram between 300 and 350 s:
start = 300*rate
end = 350*rate
channel1 = audData[:,0][start:end]
setting windowSize=4096
and hopSize=4096
a demoniac face will be uncovered within the music spectrogram:

I hope you like this first introduction to Cython 🙂 feel free to send me an email for questions or comments at: [email protected]
Bibliography
- https://onlinelibrary.wiley.com/doi/full/10.1002/jmri.1160
- https://www.sciencedirect.com/science/article/abs/pii/S0010482516302104
- https://www.scirp.org/html/4708.html
- https://www.roe.ac.uk/japwww/teaching/fourier/compression.html
- http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.307.8732&rep=rep1&type=pdf