Tunix (Tune-in-JAX) is a JAX based library designed to streamline the post-training of Large Language Models. It provides efficient and scalable supports for:
- SOTA Training performance on TPUs
- Supervised Fine-Tuning
- Reinforcement Learning (RL)
- Agentic RL
Tunix leverages the power of JAX for accelerated computation and seamless integration with JAX-based modeling framework like Flax NNX, and integrates with high-performance inference engines like vLLM and SGLang-JAX for rollout. For our detailed documentation, please refer to Tunix Webstite
Current Status: V2 Release
Tunix is under active development. Our team is actively working on expanding its capabilities, usability and performance. Stay tuned for upcoming updates and new features! See Talks and Announcements for latest updates, talks, and blog posts.
Tunix serves as a state-of-the-art post-training library within the JAX training stack, positioned to leverage foundational tools like Flax, Optax, Orbax, etc. for efficient model refinement. It sits as an intermediate layer between these core utilities and optimized models like MaxText and MaxDiffusion, streamlining tuning workflows on top of the XLA and JAX infrastructure. See Design Overview for more details on the architecture.
- Supervised Fine-Tuning (SFT):
- Reinforcement Learning (RL):
- PPO (Proximal Policy Optimization)
- GRPO (Group Relative Policy Optimization)
- GSPO-Token (Token-level Group Sequence Policy Optimization)
- DAPO (Direct Alignment via Preference Optimization)
- Dr.GRPO (Distributionally Robust GRPO)
- Agentic RL:
- Multi-turn tool use
- Asynchronous rollout for high-throughput trajectory collection
- Trajectory batching and grouping
- Modularity:
- Components are designed to be reusable and composable
- Easy to customize and extend
- Performance & Efficiency:
- Native vLLM and SGLang-JAX on TPU integration for performant rollout
- Native Maxtext model integration for high performance kernels and model execution
- Micro-batching support for component level efficient execution
- Stability
- Seamless multi-host distributed training with Pathways which can scale up to thousands of devices
- Checkpointing and Fault Tolerance
Installation: Jump to Installation to install Tunix and run your first training job.
Examples: To get started, we have a number of detailed examples and tutorials. You can see Quick Start for a great set of starting examples and Examples and Guides for a comprehenvise list of all the notebooks and examples we have.
Tunix supports a growing list of models including Gemma, Llama, and Qwen families. See Models for a full list and details on how to add new ones.
We welcome contributions! As Tunix is in early development, the contribution process is still being formalized. A rough draft of the contribution process is present here. In the meantime, you can make feature requests, report issues and ask questions in our Tunix GitHub discussion forum.
GRL (Game Reinforcement Learning), developed by Hao AI Lab from UCSD, is an open-source framework for post-training large language models through multi-turn RL on challenging games. In collaboration with Tunix, GRL integrates seamless TPU support—letting users quickly run scalable, reproducible RL experiments (like PPO rollouts on Qwen2.5-0.5B-Instruct) on TPU v4 meshes with minimal setup. This partnership empowers the community to push LLM capabilities further, combining Tunix’s optimized TPU runtime with GRL’s flexible game RL pipeline for cutting-edge research and easy reproducibility.
@misc{tunix2025,
title={Tunix (Tune-in-JAX)},
author={Bao, Tianshu and Carpenter, Jeff and Chai, Lin and Gao, Haoyu and Jiang, Yangmu and Noghabi, Shadi and Sharma, Abheesht and Tan, Sizhi and Wang, Lance and Yan, Ann and Yu, Weiren and et al},
year={2025},
howpublished={\url{https://github.com/google/tunix}},
}Thank you to all our wonderful contributors!
