Xochi Thumbnailer is a Python-based tool that creates visually appealing and informative waveform images from audio files. It is based on the waveform drawing functionality found in popular DJ equipment such as Pioneer/AlphaTheta and Denon playback devices and software.
This project is part of the Xochi Performer performance tool. If you like it, consider donating🍺 or purchase a copy of Xochi Performer when it releases 🙇🏽♂️.
I'm hoping that you find this software useful and, by providing a framework from which to iterate from, other developers can come up with new approaches to waveform drawing.
python xochi_thumbnailer.py <filename>.wav
python xochi_thumbnailer.py -m three-band <filename>.wav
python xochi_thumbnailer.py -m three-band-interpolated -c <hex color> <filename>.wav
python xochi_thumbnailer.py -m bin <filename>.wav
python xochi_thumbnailer.py
Usage: xochi_thumbnailer.py [options] input-filename
Options:
--help show this help message and exit
-a OUTPUT_FILENAME_W, --waveout=OUTPUT_FILENAME_W
output waveform image (default input filename +
_w.png)
-s OUTPUT_FILENAME_S, --specout=OUTPUT_FILENAME_S
output spectrogram data (default input filename +
_s.bin)
-w IMAGE_WIDTH, --width=IMAGE_WIDTH
image width in pixels (default 2560)
-h IMAGE_HEIGHT, --height=IMAGE_HEIGHT
image height in pixels (default 639)
-m RENDER_MODE, --mode=RENDER_MODE
rendering mode: rainbow, three-band, three-band-
interpolated (default: rainbow)
-p, --profile run profiler and output profiling information
Usage: xochi_thumbnailer.py [options] input-filename
- Generate color-coded waveform images from audio files
- Support for WAV file format (Other formats coming soon)
- Save processed audio data in a compact binary format for quick reloading
This project is a prototype for implementing multicolored waveforms for my performance tool, Xochi Performer (to be released).
I was underwhelmed by the lack of information available online for generating colored waveform thumbnails, so I took a deep dive into open source tools such as Mixxx, Freesound.org, and most importantly Beat-Link. These projects provided a ton of insight into how waveform drawing is done on popular audio platforms. I took what I learned from these projects and expanded on it with my own findings and investigations.
My research into the subject was fascinating and got me thinking a lot about how we represent and interpret audio data. I've built this project in such a way that is modular enough to iterate upon and create new renderers. I leave this project here with hopes that it helps improve software visualizations in audio platforms and others can come up with new styles.
In an attempt to keep buffer and file sizes as small as possible without compromising legibility, the following processing is applied:
- All audio signals are summed and represented as mono.
- Significant downsampling is applied to the signals.
- Amplitudes are limited to int8 data range.
The audio file is read and processed in chunks (windows). Within these chunks, we downsample by only capturing the peak value. This ensures transients are retained as much as possible. Peak data captured via this method is all that is necessary to represent a waveform faithfully regardless of downsampling (within reason).
The key to representing color data is to divide the spectrum into low, mid, and high frequency ranges. Unlike some other approaches, FFT is not necessary. Simple filters are all that is needed to capture energy data within these bands. From here, you can use peak values from the individual bands to generate colors (red: low, green: mid, blue: high) or draw the bands individually (such as with three band).
Peak values can also be averaged, scaled or processed further to render other color representations. These four peak values (full, low, mid, and high spectrum information) are very powerful resources for generating alternate waveform views.
The processed data can be interpolated to create a smooth curve. It is important that you pay special care to get peak data within the window you are going to draw, averages will sacrifice transient precision but they may be useful.
The spectrum bands are useful for generating colorful representation. In the rainbow method for instance, their data is mapped 1:1 with the color bands (red is low frequencies, green mid, blue highs). This RGB value can then be converted to HLS and back to control luminance and saturation.
Finally, something I found indispensable is scaling of the waveform in methods other than linear scaling. In the three band method for instance, I use power scaling in order to get fine control of transient smoothing. A scale factor is also used to further separate the band information. These power and scale parameters should be used in conjunction with audio length (as is done on Pioneer devices) to improve legibility.
The XPKS (Xochi Peaks) binary file format stores multi-resolution peak data for waveform visualization. Here's its structure:
[Header] - 7 bytes total
- Magic bytes (4): "XPKS"
- Version (1): uint8
- Channels (1): uint8 (always 4: full, low, mid, high)
- Mip count (1): uint8 (1-3 mip levels)
[Mip Directory] - 20 bytes per entry
For each mip level:
- Peaks per second (4): uint32
- Offset (8): uint64 (position in file where mip data starts)
- Length (8): uint64 (number of peak pairs in this mip)
[Mip Data]
For each mip level:
For each window:
For each channel:
- Min value (1): int8
- Max value (1): int8
Standard mip levels are 400, 100, and 1 peaks per second, though shorter files might not include all three. Each min/max pair is stored as two signed 8-bit integers (-127 to 127) for compact storage.
This project was inspired by and references the following sources:
- Beat-Link by Deep Symmetry, LLC: Provided insights into Pioneer CDJ waveform analysis and packet representation.
- Freesound.org: Provided the initial approach for interpreting audio data and colorizing. (no longer used but critical in my understanding of audio representations)
- Mixxx: Where my investigation started, the source code was a bit difficult to understand so I used it to get an overview approach. They're implementation allows for selecting a color scheme which is very interesting and something I'd like to explore in a future release.
A very special thank you to the open-source community and the authors of the libraries and research papers that made this project possible.