Audio Analyzer for LLM with Advanced Visualizations

A powerful audio analysis tool that extracts musical features from audio files and generates natural language descriptions suitable for use with Large Language Models (LLMs). The application provides comprehensive analysis of key, tempo, mood, instrumentation, and other musical characteristics, visualized through an intuitive PyQt5 interface. NEW: Now includes stunning 90s-style geometric mandala and kaleidoscope video visualizations!

🎵 Features

Core Audio Analysis

Comprehensive Audio Analysis: Extract key, BPM, loudness, dissonance, mood, and instruments from audio files
LLM-Ready Descriptions: Generate detailed natural language descriptions of audio characteristics for use with AI models
Advanced Visualizations: Display audio spectrum, mel-band energy, and MFCC coefficients
Multi-Format Support: Process MP3, WAV, OGG, and FLAC audio files
Mood Detection: Identify emotional qualities based on musical features
Instrument Recognition: Detect probable instruments present in the audio

NEW: Video Visualizations

Radial Symmetry Mandala: Organic curved petals with flowing gradients that respond to your music
Sacred Geometry: Complex geometric patterns with mathematical precision and interlocking shapes
Kaleidoscope Effects: Flowing organic shapes with liquid-like movement
Audio-Reactive: Visuals respond in real-time to bass, mid, and treble frequencies
Multiple Styles: Choose between mandala, sacred geometry, kaleidoscope, or mixed (auto-switching) modes
MP4 Export: Generate high-quality video files ready for social media or presentations
Customizable Settings: Adjust duration (5-60s), frame rate (15-30 FPS), visual style, and custom filenames

🖥️ Screenshots

Analysis Results

NEW: Video Visualizations

Example visualizations showing radial symmetry mandala, sacred geometry, and kaleidoscope effects responding to music

📹 Sample Visualizations

Radial Symmetry Mandala

Organic curved petals with flowing gradients responding to bass and mid frequencies

Sacred Geometry

Complex interlocking geometric patterns with mathematical precision

Kaleidoscope

Flowing organic shapes with liquid-like movement and symmetric mirroring

Mixed Mode Examples

🛠️ Technology Stack

This application leverages several powerful libraries:

Essentia: Advanced open-source library for audio analysis, developed by the Music Technology Group at Universitat Pompeu Fabra
PyQt5: Cross-platform GUI toolkit for creating desktop applications
OpenCV: Computer vision library for advanced video generation and image processing
NumPy: Fundamental package for scientific computing with Python
Matplotlib: Comprehensive library for creating static, animated, and interactive visualizations

🔧 Installation

Docker Installation (Recommended)

Docker provides the easiest and most consistent way to run the application across different platforms.

Windows with Docker

Prerequisites:
- Install Docker Desktop for Windows
- Install VcXsrv Windows X Server
Start VcXsrv:
- Launch XLaunch from Start menu
- Select "Multiple windows" and click Next
- Select "Start no client" and click Next
- Check "Disable access control" (important) and uncheck "Native opengl", then click Next
- Click Finish to start the X server

Build and Run:

# Build the Docker image
docker build -t audio-analyzer .

# Run the container with GUI support
docker run -it --rm -e DISPLAY=host.docker.internal:0.0 -e QT_QPA_PLATFORM=xcb -v "${PWD}:/app" audio-analyzer

Note for Command Prompt: Use %cd% instead of ${PWD}:

docker run -it --rm -e DISPLAY=host.docker.internal:0.0 -e QT_QPA_PLATFORM=xcb -v "%cd%:/app" audio-analyzer

Access Music Files: To analyze music files stored elsewhere on your system, add an additional volume mount:
```
docker run -it --rm -e DISPLAY=host.docker.internal:0.0 -e QT_QPA_PLATFORM=xcb -v "${PWD}:/app" -v "D:/music:/music" audio-analyzer
```
Then your music files will be available at /music in the container.

Ubuntu with Docker

Install Docker:

sudo apt-get update
sudo apt-get install docker.io
sudo systemctl start docker
sudo systemctl enable docker
sudo usermod -aG docker $USER
# Log out and back in for the group to take effect

Build and Run:

# Build the Docker image
docker build -t audio-analyzer .

# Run the container
docker run -it --rm -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v "$(pwd):/app" audio-analyzer

If X11 Display Issues:
```
xhost +local:docker
```

Ubuntu Native Installation

Install system dependencies:

sudo apt-get update
sudo apt-get install python3-pip python3-pyqt5 python3-numpy python3-matplotlib python3-opencv-dev

Install Essentia:

sudo apt-get install libyaml-dev libfftw3-dev libavcodec-dev libavformat-dev libavutil-dev libavresample-dev
sudo pip3 install essentia

Clone the repository:

git clone https://github.com/username/audio-analyzer-llm.git
cd audio-analyzer-llm

Install Python dependencies:
```
pip install -r requirements.txt
```

Using the Makefile (Ubuntu)

A Makefile is provided for simplified installation and usage on Ubuntu:

# Set up the environment and install dependencies
make setup

# Run the application
make run

# Clean up virtual environment and cache files
make clean

🚀 Usage

Basic Workflow

Run the application:
```
python3 main.py
```
Load an audio file:
- Click "Browse" to select an audio file (MP3, WAV, OGG, FLAC)
- The video visualization panel becomes enabled immediately
Generate visualizations (NEW):
- Set duration (5-60 seconds)
- Choose frame rate (15-30 FPS)
- Select style: Mandala, Sacred Geometry, Kaleidoscope, or Mixed
- Enter custom filename (optional)
- Click "Generate Visualization" to create MP4/AVI file
- Output saved with your chosen filename or auto-generated name
Analyze audio (optional):
- Click "Analyze" to process the file for detailed analysis
- View the analysis results and description
- Use the buttons to switch between different audio visualizations
- Click "Copy to Clipboard" to copy the generated description

NEW: Video Visualization Styles

Radial Symmetry Mandala

Organic curved petals with smooth flowing gradients
4 concentric layers creating depth and complexity
36-point smooth curves for natural organic appearance
Audio-reactive: Bass controls petal count, mid frequencies affect flow patterns
Breathing effects: Petals pulse and flow with the music

Sacred Geometry

Complex mathematical patterns with interlocking geometric shapes
5 distinct layers: Outer ring, star polygons, inner polygons, Flower of Life center, connecting lines
Perfect symmetry: Hexagons, triangles, and sacred ratios
Audio-reactive: Different frequency bands control various geometric elements
Precision: Mathematical relationships create harmonious visual balance

Kaleidoscope

Flowing organic shapes with liquid-like movement
64 flowing segments with multiple depth layers
Wave-based motion: Multiple sine/cosine functions create organic flow
Symmetric mirroring: True kaleidoscope effect with dynamic color cycling
Audio-reactive: All frequency ranges contribute to fluid motion

Mixed Mode

Intelligent auto-switching between all three styles based on audio characteristics
Bass-heavy sections: Display radial symmetry mandalas
Mid-heavy sections: Show sacred geometry patterns
Treble-heavy sections: Flow with kaleidoscope effects
Seamless transitions: Smooth changes based on real-time audio analysis

Integration with LLMs

Copy the generated description from the application
Use it as input for ChatGPT, Claude, or other LLMs for music-informed creative content generation
Perfect for generating album descriptions, playlist narratives, or creative writing prompts

🧠 Technical Details

Analysis Pipeline

The audio analysis process follows these steps:

Loading: Audio is loaded using Essentia's MonoLoader at 44.1kHz
Feature Extraction:
- Spectral analysis using Essentia's Spectrum algorithm
- MFCCs (Mel-Frequency Cepstral Coefficients) for timbre analysis
- Mel-band energy distribution for frequency analysis
- HPCP (Harmonic Pitch Class Profile) for key detection
- Rhythm extraction for BPM and beat patterns
- Loudness and dissonance calculation
High-Level Feature Derivation:
- Mood detection based on key, tempo, and spectral features
- Instrument detection using spectral characteristics
Description Generation:
- Natural language synthesis of all extracted features
- Thematic suggestions based on detected mood

NEW: Visualization Pipeline

Audio Processing: Frame-by-frame analysis using Essentia algorithms
Feature Mapping: Bass, mid, treble extracted from spectrum analysis
Geometric Generation: Mathematical functions create precise patterns
Audio Reactivity: Real-time mapping of audio features to visual parameters
Video Export: OpenCV-based MP4 generation with customizable quality

Visualization Types

Traditional Analysis Plots

Spectrum: Frequency domain representation of the audio
Mel Bands: Energy distribution across perceptually-weighted frequency bands
MFCC: Coefficients representing the short-term power spectrum

NEW: Video Visualizations

Frame Rate: 15-30 FPS for smooth motion
Resolution: 512x512 pixels (customizable)
Color Palettes: Neon and plasma themes optimized for visual impact
Audio Synchronization: Perfect sync between audio features and visual elements

📋 Requirements

Python 3.8+
Essentia 2.1+
PyQt5
OpenCV 4.5+
NumPy <2.0 (for Essentia compatibility)
Pillow 9.0+

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgements

The Essentia team for their incredible audio analysis library
Music Technology Group at Universitat Pompeu Fabra
OpenCV community for powerful computer vision tools

🔍 Troubleshooting

Docker Issues

Windows

If you see "Cannot connect to X server" errors, ensure VcXsrv is running with "Disable access control" checked
If Windows Defender Firewall prompts appear, allow access for VcXsrv
For path issues with volume mounts, use forward slashes (/) instead of backslashes ()
Use QT_QPA_PLATFORM=xcb environment variable if GUI doesn't appear

Ubuntu

If you encounter "Cannot open display" errors, try running xhost +local:docker before starting the container
If you see permission issues, ensure your user is in the docker group: sudo usermod -aG docker $USER

NumPy Compatibility Issues

If you see NumPy version errors with Essentia:

pip install 'numpy<2.0'

Essentia requires NumPy 1.x for compatibility.

Visualization Issues

"Could not open video writer" error: The system automatically tries multiple codecs (mp4v, XVID, MJPG, X264) and file formats (MP4, AVI) for maximum compatibility
Poor audio reactivity: Try increasing audio amplitude or adjusting frequency band scaling in the code
Performance issues: Reduce frame rate or duration for faster processing
Large file sizes: Lower frame rate or use shorter durations; 1024px renders require more processing power

Essentia Installation Issues

Essentia may fail to build on some systems. The Docker approach avoids these issues by using a pre-configured environment
If building Essentia from source, refer to the official installation guide
The info message about missing SVM classifier models is harmless and doesn't affect functionality

🎨 Customization

Adding New Visualization Styles

The visualization system is designed to be extensible. To add new styles:

Create a new generation method in analyzer/visualizer.py (e.g., generate_your_style_frame())
Add the style to the dropdown in analyzer/ui/panels.py
Update the style selection logic in create_visualization_video()
Consider audio reactivity: map bass, mid, treble, and mel bands to visual parameters

Modifying Color Palettes

Edit the color_palettes dictionary in VisualizationGenerator.__init__() to add new color schemes:

self.color_palettes = {
    'your_palette': [(R, G, B), (R, G, B), ...],  # Add custom colors
    'neon': [...],  # Existing palettes
    'plasma': [...]
}

Adjusting Audio Reactivity

Modify the scaling factors in extract_frame_features() to change how responsive the visuals are:

bass = min(features['bass'] * 500, 1.0)  # Increase multiplier for more sensitivity
mid = min(features['mid'] * 300, 1.0)    # Adjust these values as needed
treble = min(features['treble'] * 200, 1.0)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
analyzer		analyzer
Dockerfile		Dockerfile
Makefile		Makefile
file_tree.txt		file_tree.txt
gitignore.txt		gitignore.txt
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Audio Analyzer for LLM with Advanced Visualizations

🎵 Features

Core Audio Analysis

NEW: Video Visualizations

🖥️ Screenshots

Analysis Results

NEW: Video Visualizations

📹 Sample Visualizations

Radial Symmetry Mandala

Sacred Geometry

Kaleidoscope

Mixed Mode Examples

🛠️ Technology Stack

🔧 Installation

Docker Installation (Recommended)

Windows with Docker

Ubuntu with Docker

Ubuntu Native Installation

Using the Makefile (Ubuntu)

🚀 Usage

Basic Workflow

NEW: Video Visualization Styles

Radial Symmetry Mandala

Sacred Geometry

Kaleidoscope

Mixed Mode

Integration with LLMs

🧠 Technical Details

Analysis Pipeline

NEW: Visualization Pipeline

Visualization Types

Traditional Analysis Plots

NEW: Video Visualizations

📋 Requirements

🤝 Contributing

📜 License

🙏 Acknowledgements

🔍 Troubleshooting

Docker Issues

Windows

Ubuntu

NumPy Compatibility Issues

Visualization Issues

Essentia Installation Issues

🎨 Customization

Adding New Visualization Styles

Modifying Color Palettes

Adjusting Audio Reactivity

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages