Skip to content

Alzy/Xochi-Thumbnailer

Repository files navigation

Xochi Thumbnailer

ko-fi

Xochi Thumbnailer is a Python-based tool that creates visually appealing and informative waveform images from audio files. It is based on the waveform drawing functionality found in popular DJ equipment such as Pioneer/AlphaTheta and Denon playback devices and software.

This project is part of the Xochi Performer performance tool. If you like it, consider donating🍺 or purchase a copy of Xochi Performer when it releases 🙇🏽‍♂️.

I'm hoping that you find this software useful and, by providing a framework from which to iterate from, other developers can come up with new approaches to waveform drawing.

Usage

To generate a rainbow waveform image (default):

python xochi_thumbnailer.py <filename>.wav

Rainbow Waveform

To generate a three band waveform image:

python xochi_thumbnailer.py -m three-band <filename>.wav

Three Band Waveform

To generate a three band interpolated waveform image:

python xochi_thumbnailer.py -m three-band-interpolated -c <hex color> <filename>.wav

Three Band Waveform Three Band Waveform Three Band Waveform

To generate thumbnail bin file:

python xochi_thumbnailer.py -m bin <filename>.wav

General usage

python xochi_thumbnailer.py


Usage: xochi_thumbnailer.py [options] input-filename

Options:
  --help                show this help message and exit
  -a OUTPUT_FILENAME_W, --waveout=OUTPUT_FILENAME_W
                        output waveform image (default input filename +
                        _w.png)
  -s OUTPUT_FILENAME_S, --specout=OUTPUT_FILENAME_S
                        output spectrogram data (default input filename +
                        _s.bin)
  -w IMAGE_WIDTH, --width=IMAGE_WIDTH
                        image width in pixels (default 2560)
  -h IMAGE_HEIGHT, --height=IMAGE_HEIGHT
                        image height in pixels (default 639)
  -m RENDER_MODE, --mode=RENDER_MODE
                        rendering mode: rainbow, three-band, three-band-
                        interpolated (default: rainbow)
  -p, --profile         run profiler and output profiling information
Usage: xochi_thumbnailer.py [options] input-filename

Features

  • Generate color-coded waveform images from audio files
  • Support for WAV file format (Other formats coming soon)
  • Save processed audio data in a compact binary format for quick reloading

Motivation

This project is a prototype for implementing multicolored waveforms for my performance tool, Xochi Performer (to be released).

I was underwhelmed by the lack of information available online for generating colored waveform thumbnails, so I took a deep dive into open source tools such as Mixxx, Freesound.org, and most importantly Beat-Link. These projects provided a ton of insight into how waveform drawing is done on popular audio platforms. I took what I learned from these projects and expanded on it with my own findings and investigations.

My research into the subject was fascinating and got me thinking a lot about how we represent and interpret audio data. I've built this project in such a way that is modular enough to iterate upon and create new renderers. I leave this project here with hopes that it helps improve software visualizations in audio platforms and others can come up with new styles.

Notes

In an attempt to keep buffer and file sizes as small as possible without compromising legibility, the following processing is applied:

  • All audio signals are summed and represented as mono.
  • Significant downsampling is applied to the signals.
  • Amplitudes are limited to int8 data range.

Under the hood

Audio Processing:

The audio file is read and processed in chunks (windows). Within these chunks, we downsample by only capturing the peak value. This ensures transients are retained as much as possible. Peak data captured via this method is all that is necessary to represent a waveform faithfully regardless of downsampling (within reason).

The key to representing color data is to divide the spectrum into low, mid, and high frequency ranges. Unlike some other approaches, FFT is not necessary. Simple filters are all that is needed to capture energy data within these bands. From here, you can use peak values from the individual bands to generate colors (red: low, green: mid, blue: high) or draw the bands individually (such as with three band).

Peak values can also be averaged, scaled or processed further to render other color representations. These four peak values (full, low, mid, and high spectrum information) are very powerful resources for generating alternate waveform views.

Waveform Generation:

The processed data can be interpolated to create a smooth curve. It is important that you pay special care to get peak data within the window you are going to draw, averages will sacrifice transient precision but they may be useful.

The spectrum bands are useful for generating colorful representation. In the rainbow method for instance, their data is mapped 1:1 with the color bands (red is low frequencies, green mid, blue highs). This RGB value can then be converted to HLS and back to control luminance and saturation.

Finally, something I found indispensable is scaling of the waveform in methods other than linear scaling. In the three band method for instance, I use power scaling in order to get fine control of transient smoothing. A scale factor is also used to further separate the band information. These power and scale parameters should be used in conjunction with audio length (as is done on Pioneer devices) to improve legibility.

Binary Data Format:

The XPKS (Xochi Peaks) binary file format stores multi-resolution peak data for waveform visualization. Here's its structure:

[Header] - 7 bytes total
- Magic bytes (4): "XPKS"
- Version (1): uint8
- Channels (1): uint8 (always 4: full, low, mid, high)
- Mip count (1): uint8 (1-3 mip levels)

[Mip Directory] - 20 bytes per entry
For each mip level:
- Peaks per second (4): uint32
- Offset (8): uint64 (position in file where mip data starts)
- Length (8): uint64 (number of peak pairs in this mip)

[Mip Data]
For each mip level:
  For each window:
    For each channel:
      - Min value (1): int8
      - Max value (1): int8

Standard mip levels are 400, 100, and 1 peaks per second, though shorter files might not include all three. Each min/max pair is stored as two signed 8-bit integers (-127 to 127) for compact storage.

References

This project was inspired by and references the following sources:

  • Beat-Link by Deep Symmetry, LLC: Provided insights into Pioneer CDJ waveform analysis and packet representation.
  • Freesound.org: Provided the initial approach for interpreting audio data and colorizing. (no longer used but critical in my understanding of audio representations)
  • Mixxx: Where my investigation started, the source code was a bit difficult to understand so I used it to get an overview approach. They're implementation allows for selecting a color scheme which is very interesting and something I'd like to explore in a future release.

A very special thank you to the open-source community and the authors of the libraries and research papers that made this project possible.

About

DJ style audio thumbnailer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Languages