Data Augmentation Techniques for Audio Data in Python

How to augment audio in waveform (time domain) and as spectrograms (frequency domain) with librosa, numpy, and PyTorch

Leonie Monigatti
Towards Data Science

Augmentation for audio data
(Image drawn by the author)

Deep Learning models are data-hungry. If you don’t have a sufficient amount of data, generating synthetic data from the available dataset can help improve the generalization capabilities of your Deep Learning model. While you might already be familiar with data augmentation techniques for images (e.g., flipping an image horizontally), data augmentation techniques for audio data are often lesser known.

This article will review popular data augmentation techniques for audio data. You can apply data augmentations for audio data in the waveform and in the spectrogram:

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Responses (3)

What are your thoughts?

Suppose i am doing classification of animal sound i am interested in dog sound and in every dog barking sound files and there are background noises of wind, car horn, speaking in ay of the sound file . My question is how to suppress the noises in background and clear the dog bark sound for classification.

--

Great article, thanks for sharing! A quick note on volume augmentation. To increase amplitude in the time domain, the signal should be multiplied by a constant (e.g. 2*original_audio is approx. +6dB in gain). Adding a constant to a signal, like 5…...

--

Could you elaborate a little on why volume augmentation doesn’t really effect FFT? Is it because if applied to the waveform it would affect all frequencies equally?

--