LightDiffusion-Next is the fastest AI-powered image generation WebUI, combining speed, precision, and flexibility in one cohesive tool.

As a refactored and improved version of the original LightDiffusion repository, this project enhances usability, maintainability, and functionality while introducing a host of new features to streamline your creative workflows.
LightDiffusion was originally meant to be made in Rust, but due to the lack of support for the Rust language in the AI community, it was made in Python with the goal of being the simplest and fastest AI image generation tool.
That's when the first version of LightDiffusion was born which only counted 3000 lines of code, only using Pytorch. With time, the project grew and became more complex, and the need for a refactor was evident. This is where LightDiffusion-Next comes in, with a more modular and maintainable codebase, and a plethora of new features and optimizations.
π Learn more in the official documentation.
LightDiffusion-Next offers a powerful suite of tools to cater to creators at every level. At its core, it supports Text-to-Image (Txt2Img) and Image-to-Image (Img2Img) generation, offering a variety of upscale methods and samplers, to make it easier to create stunning images with minimal effort.
Advanced users can take advantage of features like attention syntax, Hires-Fix or ADetailer. These tools provide better quality and flexibility for generating complex and high-resolution outputs.
LightDiffusion-Next is fine-tuned for performance. Features such as Xformers acceleration, BFloat16 precision support, WaveSpeed dynamic caching, Multi-scale diffusion, and Stable-Fast model compilation (which offers up to a 70% speed boost) ensure smooth and efficient operation, even on demanding workloads.
Hereβs what makes LightDiffusion-Next stand out:
-
Speed and Efficiency: Enjoy industry-leading performance with built-in Xformers, Pytorch, Wavespeed and Stable-Fast optimizations, Multi-scale diffusion, deepcache, AYS (Align Your Steps) scheduler, and automatic prompt caching achieving 30% up to 200% faster speeds compared to the rest of the AI image generation backends in SD1.5 and Flux.
-
Automatic Detailing: Effortlessly enhance faces and body details with AI-driven tools based on the Impact Pack.
-
State Preservation: Save and resume your progress with saved states, ensuring seamless transitions between sessions.
-
Integration-Ready: Collaborate and create directly in Discord with Boubou, or preview images dynamically with the optional TAESD preview mode.
-
Image Previewing: Get a real-time preview of your generated images with TAESD, allowing for user-friendly and interactive workflows.
-
Image Upscaling: Enhance your images with advanced upscaling options like UltimateSDUpscaling, ensuring high-quality results every time.
-
Prompt Refinement: Use the optional Ollama-powered prompt enhancer (defaults to
qwen3:0.6b) to refine your prompts and generate more accurate and detailed outputs. -
LoRa and Textual Inversion Embeddings: Leverage LoRa and textual inversion embeddings for highly customized and nuanced results, adding a new dimension to your creative process.
-
Low-End Device Support: Run LightDiffusion-Next on low-end devices with as little as 2GB of VRAM or even no GPU, ensuring accessibility for all users.
-
CFG++: Uses samplers modified to use CFG++ for better quality results compared to traditional methods.
-
Newelle Extension: LightDiffusion-Next is also available as a backend to the Newelle LightDiffusion extension permitting to generate images inline during conversations with llms.
LightDiffusion-Next dominates in performance:
| Tool | Speed (it/s) |
|---|---|
| LightDiffusion with Stable-Fast | 2.8 |
| LightDiffusion | 1.9 |
| ComfyUI | 1.4 |
| SDForge | 1.3 |
| SDWebUI | 0.9 |
(All benchmarks are based on a 1024x1024 resolution with a batch size of 1 using BFloat16 precision without tweaking installations. Made with a 3060 mobile GPU using SD1.5.)
With its unmatched speed and efficiency, LightDiffusion-Next sets the benchmark for AI image generation tools.
Note
Platform Support: LightDiffusion-Next supports NVIDIA GPUs (CUDA), AMD GPUs (ROCm), and Apple Silicon (Metal/MPS). For AMD and Apple Silicon setup instructions, see the ROCm and Metal/MPS Support Guide.
Warning
Disclaimer: On Linux, the fastest way to get started is with the Docker setup below. Windows users often encounter an EOF build error when using Docker; if that happens, set up a local virtual environment instead and install SageAttention inside it.
Note
You will need to download the flux vae separately given its gated repo on Huggingface. Drop it in the /include/vae folder.
- Download a release or clone this repository.
- Run
run.batin a terminal. - The Streamlit UI will launch automatically at
http://localhost:8501
Alternative UIs:
- Streamlit UI (default): Modern, clean interface with better organization
- Gradio UI: Run
python app.pyto use the original Gradio interface, mainly for huggingface spaces GPU compatibility.
Run LightDiffusion-Next in a containerized environment with GPU acceleration:
Important
Confirm you have Docker Desktop configured with the NVIDIA Container Toolkit and at least 12-16GB of memory. Builds expect an NVIDIA GPU with compute capability 8.0 or higher and CUDA 12.0+ support for SageAttention/SpargeAttn.
Quick Start with Docker:
# Build and run with docker-compose (recommended - uses Streamlit by default)
docker-compose up --build
# Or build and run manually with Streamlit
docker build -t lightdiffusion-next .
docker run --gpus all -p 8501:8501 -e UI_FRAMEWORK=streamlit -v ./output:/app/output lightdiffusion-next
# To use Gradio instead:
docker run --gpus all -p 7860:7860 -e UI_FRAMEWORK=gradio -v ./output:/app/output lightdiffusion-nextCustom GPU Architecture (Optional):
# For faster builds, specify your GPU architecture (e.g., RTX 5060 = 12.0)
docker-compose build --build-arg TORCH_CUDA_ARCH_LIST="12.0"
# Default builds for: 8.0 (A100), 8.6 (RTX 30xx), 8.9 (RTX 40xx), 9.0 (H100), 12.0 (RTX 50xx)Built-in Optimizations: The Docker image can build the following acceleration paths:
- β¨ SageAttention - 15% speedup with INT8 quantization (all supported GPUs)
- π SpargeAttn - 40-60% speedup with sparse attention (compute 8.0-9.0 only)
- β‘ Stable-Fast - Optional UNet compilation for up to 70% faster SD1.5 inference
Control them through build arguments (defaults shown below):
docker-compose build \
--build-arg TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0;12.0" \
--build-arg INSTALL_STABLE_FAST=1 \
--build-arg INSTALL_OLLAMA=0Set INSTALL_STABLE_FAST=1 to enable the compilation step for stable-fast, or INSTALL_OLLAMA=1 to bake in the prompt enhancer runtime.
Note
RTX 50 series (compute 12.0) GPUs currently only support SageAttention.
Access the Web Interface:
- Streamlit UI (default):
http://localhost:8501 - Gradio UI:
http://localhost:7860(setUI_FRAMEWORK=gradioin docker-compose.yml)
Volume Mounts:
./output:/app/output- Persist generated images./checkpoints:/app/include/checkpoints- Store model files./loras:/app/include/loras- Store LoRA files./embeddings:/app/include/embeddings- Store embeddings
-
Install from Source: Install dependencies via:
pip install -r requirements.txt
Add your SD1/1.5 safetensors model to the
checkpointsdirectory, then launch the application. -
β‘Stable-Fast Optimization: Follow this guide to enable Stable-Fast mode for optimal performance. In Docker environments, set
INSTALL_STABLE_FAST=1to compile it during the image build orINSTALL_STABLE_FAST=0(default) to skip. -
π SageAttention & SpargeAttn Acceleration: Boost inference speed by up to 60% with advanced attention backends:
Prerequisites:
- CUDA toolkit installed with version compatible with your PyTorch installation
SageAttention (15% speedup, Windows compatible):
cd SageAttention pip install -e . --no-build-isolation
SpargeAttn (40-60% total speedup, requires WSL2/Linux):
Caution
SpargeAttn cannot be built with the default Windows linker. Use WSL2 or a native Linux environment and set the correct TORCH_CUDA_ARCH_LIST before installation.
# On WSL2 or Linux only (Windows linker has path length limitations)
cd SpargeAttn
export TORCH_CUDA_ARCH_LIST="9.0" # Or your GPU architecture (8.0, 8.6, 8.9, 9.0)
pip install -e . --no-build-isolationPriority System: SpargeAttn > SageAttention > PyTorch SDPA
-
Both are automatically detected and used when available
-
Graceful fallback for unsupported head dimensions
-
π¦ Prompt Enhancer: Turn on the Ollama-backed enhancer to automatically restructure prompts. By default the app targets
qwen3:0.6b:# Local install pip install ollama curl -fsSL https://ollama.com/install.sh | sh # Start the Ollama daemon (keep this terminal open) ollama serve # New terminal: pull the default prompt enhancer model ollama pull qwen3:0.6b export PROMPT_ENHANCER_MODEL=qwen3:0.6b
In Docker builds, set
--build-arg INSTALL_OLLAMA=1(or updatedocker-compose.yml) to install Ollama and pre-pull the model automatically. You can override the runtime model/prefix with thePROMPT_ENHANCER_MODELandPROMPT_ENHANCER_PREFIXenvironment variables. See the Ollama guide for details. -
π€ Discord Integration: Set up the Discord bot by following the Boubou installation guide.
- This project distributes builds that depend on third-party open source components. For attribution details and the full license text, refer to
THIRD_PARTY_LICENSES.md.
π¨ Enjoy exploring the powerful features of LightDiffusion-Next!
Tip
β If this project helps you, please give it a star! It helps others discover it too.
