GitHub - vllm-project/vllm-omni: A framework for efficient model inference with omni-modality models

Easy, fast, and cheap omni-modality model serving for everyone

Latest News 🔥

[2026/01] We released 0.12.0rc1 - a major RC milestone focused on maturing the diffusion stack, strengthening OpenAI-compatible serving, expanding omni-model coverage, and improving stability across platforms (GPU/NPU/ROCm), please check our latest design.
[2025/11] vLLM community officially released vllm-project/vllm-omni in order to support omni-modality models serving.

About

vLLM was originally designed to support large language models for text-based autoregressive generation tasks. vLLM-Omni is a framework that extends its support for omni-modality model inference and serving:

Omni-modality: Text, image, video, and audio data processing
Non-autoregressive Architectures: extend the AR support of vLLM to Diffusion Transformers (DiT) and other parallel generation models
Heterogeneous outputs: from traditional text generation to multimodal outputs

vllm-omni

vLLM-Omni is fast with:

State-of-the-art AR support by leveraging efficient KV cache management from vLLM
Pipelined stage execution overlapping for high throughput performance
Fully disaggregation based on OmniConnector and dynamic resource allocation across stages

vLLM-Omni is flexible and easy to use with:

Heterogeneous pipeline abstraction to manage complex model workflows
Seamless integration with popular Hugging Face models
Tensor, pipeline, data and expert parallelism support for distributed inference
Streaming outputs
OpenAI-compatible API server

vLLM-Omni seamlessly supports most popular open-source models on HuggingFace, including:

Omni-modality models (e.g. Qwen-Omni)
Multi-modality generation models (e.g. Qwen-Image)

Getting Started

Visit our documentation to learn more.

Contributing

We welcome and value any contributions and collaborations. Please check out Contributing to vLLM-Omni for how to get involved.

Join the Community

Feel free to ask questions, provide feedbacks and discuss with fellow users of vLLM-Omni in #sig-omni slack channel at slack.vllm.ai or vLLM user forum at discuss.vllm.ai.

Star History

License

Apache License 2.0, as found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 533 Commits
.buildkite		.buildkite
.github		.github
benchmarks		benchmarks
docker		docker
docs		docs
examples		examples
scripts		scripts
tests		tests
tools/pre_commit		tools/pre_commit
vllm_omni		vllm_omni
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
collect_env.py		collect_env.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Easy, fast, and cheap omni-modality model serving for everyone

About

Getting Started

Contributing

Join the Community

Star History

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 79

Uh oh!

Languages

License

vllm-project/vllm-omni

Folders and files

Latest commit

History

Repository files navigation

Easy, fast, and cheap omni-modality model serving for everyone

About

Getting Started

Contributing

Join the Community

Star History

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 79

Uh oh!

Languages

Packages