Data Augmentation Techniques for Audio Data in Python

How to augment audio in waveform (time domain) and as spectrograms (frequency domain) with librosa, numpy, and PyTorch

Published in

Towards Data Science

7 min readMar 28, 2023

Augmentation for audio data — (Image drawn by the author)

Deep Learning models are data-hungry. If you don’t have a sufficient amount of data, generating synthetic data from the available dataset can help improve the generalization capabilities of your Deep Learning model. While you might already be familiar with data augmentation techniques for images (e.g., flipping an image horizontally), data augmentation techniques for audio data are often lesser known.

Audio Classification with Deep Learning in Python

Fine-tuning image models to tackle domain shift and class imbalance with PyTorch and torchaudio in audio data

towardsdatascience.com

This article will review popular data augmentation techniques for audio data. You can apply data augmentations for audio data in the waveform and in the spectrogram:

Audio Data Augmentations for Waveform (Time Domain)
∘ Noise injection
∘ Shifting time
∘ Changing speed
∘ Changing pitch
∘ Changing volume (not recommended)
Audio Data Augmentations for Spectrograms (Frequency Domain)
∘ Mixup
∘ SpecAugment

Suppose i am doing classification of animal sound i am interested in dog sound and in every dog barking sound files and there are background noises of wind, car horn, speaking in ay of the sound file . My question is how to suppress the noises in background and clear the dog bark sound for classification.

Great article, thanks for sharing! A quick note on volume augmentation. To increase amplitude in the time domain, the signal should be multiplied by a constant (e.g. 2*original_audio is approx. +6dB in gain). Adding a constant to a signal, like 5…...

Data Augmentation Techniques for Audio Data in Python

How to augment audio in waveform (time domain) and as spectrograms (frequency domain) with librosa, numpy, and PyTorch

Audio Classification with Deep Learning in Python

Fine-tuning image models to tackle domain shift and class imbalance with PyTorch and torchaudio in audio data

Create an account to read the full story.

Published in Towards Data Science

Written by Leonie Monigatti

Responses (3)

More from Leonie Monigatti and Towards Data Science

Evaluating RAG Applications with RAGAs

A framework with metrics and LLM-generated data to evaluate the performance of your Retrieval-Augmented Generation pipeline

How to Run Jupyter Notebooks and Generate HTML Reports with Python Scripts

A step-by-step guide to automating Jupyter Notebook execution and report generation using Python

Are Meta’s AI Profiles Unethical?

As AI becomes further enmeshed into every product we use, what rules should exist to protect humans?

Intro to DSPy: Goodbye Prompting, Hello Programming!

How the DSPy framework solves the fragility problem in LLM-based applications by replacing prompting with programming and compiling

Recommended from Medium

Audio Classification with Deep Learning in Python

Fine-tuning image models to tackle domain shift and class imbalance with PyTorch and torchaudio in audio data

Use HuggingFace Stable Diffusion Model to Generate Images from Text

I typed “Generate a picture illustrating AI for drawing a picture” to Bing’s Copilot. The picture above was generated by Copilot. Have you…

Lists

Predictive Modeling w/ Python

Natural Language Processing

ChatGPT prompts

ChatGPT

“Transform Your Audio: Denoise and Enhance Sound Quality with Python Using Pedalboard

Imagine listening to an old vinyl record, where every note tells a story but is buried under layers of static and hiss. What if you could…

Exploring Text Embedding and Clustering Using BERT

Introduction

Implementing Image Compression using Principal Component Analysis

A Comprehensive Guide With Python

Building a Speech-to-Text Analysis System with Python

Speaker Diarization and Identification