Golang Mel Spectrogram

Audio-to-spectrogram and spectrogram-to-audio conversion library and command-line tools for Go.

Features

Mel Spectrograms: Perceptually-motivated frequency representation for audio processing
Phase-Preserving Spectrograms: High-quality audio reconstruction without phase estimation
Bidirectional Conversion: Audio → Spectrogram → Audio
Multiple Formats: Support for WAV and FLAC input files
PNG Export: Save spectrograms as images for visualization or ML applications

Installation

go get github.com/neurlang/gomel

Command-Line Tools

tomel - Audio to Mel Spectrogram

Convert audio files to mel spectrogram images:

tomel <audio_file>

Example:

tomel glados-1609757458000_.wav
# Creates: glados-1609757458000_.wav.png

Supports .wav and .flac files. If no extension is provided, .wav is assumed.

towav - Mel Spectrogram to Audio

Reconstruct audio from mel spectrogram images using Griffin-Lim algorithm:

towav <png_file> [sample_rate]

Example:

towav spectrogram.png 44100
# Creates: spectrogram.png.wav

Default sample rate: 44100 Hz

tophase - Audio to Phase Spectrogram

Convert audio files to phase-preserving spectrogram images:

tophase <audio_file>

Example:

tophase audio.flac
# Creates: audio.flac.png

Phase spectrograms retain both magnitude and phase information for lossless reconstruction.

fromphase - Phase Spectrogram to Audio

Reconstruct audio from phase-preserving spectrogram images:

fromphase <png_file> [sample_rate]

Example:

fromphase spectrogram.png 48000
# Creates: spectrogram.png.wav

Default sample rate: 44100 Hz

Library API

Mel Package

import "github.com/neurlang/gomel/mel"

// Create mel spectrogram processor
m := mel.NewMel()

// Configure parameters
m.NumMels = 192              // Number of mel bands
m.MelFmin = 0                // Minimum frequency (Hz)
m.MelFmax = 16000            // Maximum frequency (Hz)
m.Window = 1280              // Window size
m.Resolut = 4096             // FFT resolution
m.GriffinLimIterations = 2   // Iterations for audio reconstruction
m.YReverse = true            // Reverse Y-axis in image
m.VolumeBoost = 0.0          // Volume adjustment for reconstruction
m.SampleRate = 44100         // Output sample rate

// Convert audio file to mel spectrogram image
err := m.ToMelWav("input.wav", "output.png")
err := m.ToMelFlac("input.flac", "output.png")

// Convert spectrogram image back to audio
err := m.ToWavPng("input.png", "output.wav")

// Low-level API: work with buffers
samples := mel.LoadWav("input.wav")
samples := mel.LoadFlac("input.flac")

melSpec, err := m.ToMel(samples)
audioSamples, err := m.FromMel(melSpec)

err := mel.SaveWav("output.wav", audioSamples, 44100)

// Get image buffer
imageData := m.Image(melSpec)

Phase Package

import "github.com/neurlang/gomel/phase"

// Create phase spectrogram processor
p := phase.NewPhase()

// Configure parameters
p.NumFreqs = 768             // Number of frequency bins
p.Window = 1280              // Window size
p.Resolut = 4096             // FFT resolution
p.YReverse = false           // Reverse Y-axis in image
p.VolumeBoost = 4.0          // Volume adjustment for reconstruction
p.SampleRate = 44100         // Output sample rate

// Convert audio file to phase spectrogram image
err := p.ToPhaseWav("input.wav", "output.png")
err := p.ToPhaseFlac("input.flac", "output.png")

// Convert spectrogram image back to audio
err := p.ToWavPng("input.png", "output.wav")

// Low-level API: work with buffers
samples := phase.LoadWav("input.wav")
samples := phase.LoadFlac("input.flac")

// Load with sample rate
samples, sampleRate, err := phase.LoadWavSampleRate("input.wav")
samples, sampleRate, err := phase.LoadFlacSampleRate("input.flac")

phaseSpec, err := p.ToPhase(samples)
audioSamples, err := p.FromPhase(phaseSpec)

err := phase.SaveWav("output.wav", audioSamples, 44100)

// Get image buffer
imageData := p.Image(phaseSpec)

Mel vs Phase Spectrograms

Feature	Mel Spectrogram	Phase Spectrogram
Frequency Scale	Perceptual (mel scale)	Linear
Phase Information	Lost	Preserved
Reconstruction	Griffin-Lim (iterative)	Direct (no iteration)
Audio Quality	Good for speech/music	Excellent, near-lossless
Use Case	ML/AI, speech recognition	High-fidelity audio processing
File Size	Smaller	Larger

Requirements

Go 1.0 or higher

Dependencies

github.com/faiface/beep - Audio I/O
github.com/r9y9/gossp - STFT implementation
github.com/mjibson/go-dsp - FFT operations
github.com/mewkiz/flac - FLAC decoding
github.com/x448/float16 - Float16 support

License

See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.kiro/specs		.kiro/specs
cmd		cmd
mel		mel
phase		phase
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
glados-1609757458000_.png		glados-1609757458000_.png
go.mod		go.mod
go.sum		go.sum
phase.py		phase.py
requirements.txt		requirements.txt
test_one_file.py		test_one_file.py
test_phase_comprehensive.py		test_phase_comprehensive.py
test_upsampling.py		test_upsampling.py
test_zero_stuff.py		test_zero_stuff.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Golang Mel Spectrogram

Features

Installation

Command-Line Tools

tomel - Audio to Mel Spectrogram

towav - Mel Spectrogram to Audio

tophase - Audio to Phase Spectrogram

fromphase - Phase Spectrogram to Audio

Library API

Mel Package

Phase Package

Mel vs Phase Spectrograms

Requirements

Dependencies

License

About

Uh oh!

Releases 8

Packages

Uh oh!

Languages

License

neurlang/gomel

Folders and files

Latest commit

History

Repository files navigation

Golang Mel Spectrogram

Features

Installation

Command-Line Tools

tomel - Audio to Mel Spectrogram

towav - Mel Spectrogram to Audio

tophase - Audio to Phase Spectrogram

fromphase - Phase Spectrogram to Audio

Library API

Mel Package

Phase Package

Mel vs Phase Spectrograms

Requirements

Dependencies

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Languages

Packages