Introducing “Lucid Sonic Dreams”: Sync GAN Art to Music with a Few Lines of Python Code!

Making Generative Audio-Visual Art Easy and Customizable

Mikael Alafriz
7 min readMar 13, 2021

--

Generative art has made significant strides in the past few years. One need only look at the countless artworks posted across the Internet (e.g. here and here) to realize that machines have arguably equaled man in terms of artistic proficiency. The implications this has for artists, moving forward, is up for debate — but one positive thing we can all agree upon is that it opens the door to profoundly new visual and auditory experiences. Likewise, it makes the creation of art accessible even to the untrained.

Thus, enter Lucid Sonic Dreams: a Python package that syncs generative art to music in only a few lines of code!

Song: Raspberry by Saje. Model weights trained by Jeremy Torman.

Important Links

If you wish to skip straight to the point and get immersed with Lucid Sonic Dreams, here are all the links you’ll need:

Otherwise, if you’d like to further understand how the package works while also seeing a few demos in the process, read further!

How It Works

To maintain a spirit of accessibility, I’ll be doing away with most (but not all) of the technical jargon that comes with explaining generative models. Instead, I’ll be focusing on the details that matter most when understanding precisely how generated visuals are manipulated by music. After all, there are enough resources online that elaborate on the down-and-dirty details, for those who are interested.

Typically, generative artworks are produced by a class of Deep Learning frameworks called Generative Adversarial Networks (GAN). Lucid Sonic Dreams uses the StyleGAN2-ADA architecture by default — although that’s customizable, as you’ll see later. These models are trained on datasets of images that typically adhere to a certain “style”. After training, the models are able to output a virtually infinite number of images that match the style of the images they’re trained on. This repository by Justin Pinkney shows sample outputs from numerous pre-trained StyleGAN2 models, with styles ranging from Faces, to Art, to…My Little Pony?

Things become interesting when we dive into how these images are generated. A model is fed an input — in the case of StyleGAN2, a vector containing 512 numbers — which determines the output image. Small changes to the input vector will yield small changes in the output image, accordingly. Now, the fun part: what if we obtain sound waves from music, extract numerical values (e.g. amplitude) from these sound waves, and add them to the values in the input vector? Lucid Sonic Dreams does this for each frame in a video, producing art that pulses and morphs to the music it hears.

To give full credit where it’s due, this idea was inspired by the Deep Music Visualizer project by Matt Siegelman. It’s a similar Python implementation that syncs music with image generated by BigGAN. There are a few other projects online that attempt this idea, but none (to my knowledge) come in Python package form, nor are as customizable as Lucid Sonic Dreams is.

Anyway, that was a basic overview of what goes on under-the-hood. More technical details will be discussed in the “Tuning Parameters” section below!

Using the Package

Lucid Sonic Dreams was specifically designed to be easy to use and incredibly flexible. All it takes is the usual pip install…

pip install lucidsonicdreams

…followed by a few lines of Python code:

And just like that, you’re done! Here’s a sample video generated with just this block of code; I suggest watching at least the first minute-and-a-half to see the individual components build-up.

Song: Chemical Love by Basically Saturday Night

Changing Styles

You can view available default styles by simply running:

This prints a whole list of style names for you to choose from. These styles are lifted from the awesome repository by Justin Pinkney mentioned earlier. It’s also possible to pass your own StyleGAN weights, or use a different GAN architecture entirely — more details on this later.

Tuning Parameters

While the package is extremely easy to use with default settings, it actually comes with numerous parameters — over 30, in fact — that can be tuned! The Tutorial Notebook on Google Colab lists these parameters in full detail. For this article, however, we’ll go over only the basic components that the music manipulates, and the most essential parameters that control them.

With this package, the music controls 3 main visual components: Pulse, Motion, and Class. Pulse, quite literally, refers to how the visuals “pulse” to the music’s percussive elements. Mathematically, this pulse is a result of temporary additions of sound waves’ amplitude to the input vector (i.e. the vector returns back to normal in the next video frame). Motion, meanwhile, refers to the speed at which the visuals morph. Mathematically, this is a result of cumulative additions of amplitude to the input vector (i.e. whatever is added stays there).

The “Class” component is an interesting one. It refers to the labels of objects in the generated images. In the style trained on WikiArt images, for example, there are 167 classes including Van Gogh, Da Vinci, Abstract Painting, and others. These are controlled by the pitch of the music — specifically, 12 musical pitches are mapped to 12 different classes. The individual amplitudes of these pitches influence numbers passed to a second input vector (a “class vector”), which determines the objects generated by the model. This idea was lifted from the Deep Music Visualizer project mentioned previously!

To understand which parameters are most essential, let’s lay out the full video generation pipeline. First, input vectors are initialized and interpolated between. This serves as the video’s “base motion”. The parameter speed_fpm controls how fast this motion goes, where “FPM” stands for “Frames Per Minute” — essentially, the number of vectors initialized per minute. For each succeeding frame, the parameters pulse_react, motion_react, and class_react control how much each respective component is manipulated by audio.

After the model generates images from these vectors, the images are passed through a stack of effects that also react to the music. By default, the package comes with “contrast” and “flash” effects that sync with the audio’s percussive elements. These can be toggled by setting the contrast_strength and flash_strength parameters. It’s also possible to create your own custom effects — more details on this later.

Here’s some sample code that tunes some of these parameters. Notice how speed_fpm = 0, making it such that there is no motion during silent parts of the song.

Song: Pancake Feet by Tennyson

Using Your Own StyleGAN Weights

If you‘ve trained your own StyleGAN, or you happen to come across model weights online, you can pass the file path to these weights as the style parameter value. I’m personally excited to see what generative artists with their own model weights end up producing with this package!

As an example, the (incredibly stunning) visuals you see in this article’s headline video are generated with a model trained by Jeremy Torman. This thread describes his training dataset, and includes a link to the weights file. Here’s the code I used to generate the video:

Using Isolated Audio Tracks

For my fellow musicians out there looking to use this package as a music visualizer, you’re in luck! The package allows you to upload isolated audio tracks to control Pulse, Motion, Class, Contrast, and Flash. This is perfect if you want any of these visual components to sync with specific instruments. You can also use these isolated tracks for custom effects, as we’ll see in the next section.

Here’s some code I used to visualize a music track I produced myself. I used an isolated percussion track to control Pulse, and an isolated “synth chords” track to control Class.

Music by Yours Truly

Creating Custom Effects

In addition to the built-in Contrast and Flash effects, Lucid Sonic Dreams also allows you to create your own reactive custom effects. To do so, simply create a function that takes in at least 3 parameters — array, which refers to the image that the effect is applied on; strength, which determines how reactive it is to the music; and amplitude, which refers to the music’s volume at any given point in time. Afterwards, pass this custom function to an EffectsGenerator object. Here’s a quite experimental visualization that takes advantage of scikit-image’s swirl effect:

Song: Unfaith by Ekali

Using Other GAN Architectures

If you prefer to use some other GAN architecture, Lucid Sonic Dreams makes it possible for you to do so. Simply define a function that takes in a batches of noise vectors and class vectors (NumPy arrays) as input, and outputs a batch of Pillow Images. In reality, this function doesn’t even need to use a GAN — it could be any function that transforms input vectors to images.

The following code replicates the aforementioned Deep Music Visualizer by generating images using the PyTorch implementation of BigGAN. It’s a much more complex block of code that uses some additional parameters. Note that this requires you to first run pip install pytorch_pretrained_biggan.

Song: Sea of Voices by Porter Robinson

And that’s all! To those who plan to try this package out, I’m looking forward to seeing the magic you make.

--

--

Data Scientist, Musician, and Nature Lover. Perpetually keen to meet new people and explore new places. Let’s connect: www.linkedin.com/in/mikael-alafriz