GitHub - missflash/LatentMAS: Latent Collaboration in Multi-Agent Systems (LatentMAS)

Latent Collaboration in Multi-Agent Systems

💡 Introduction

LatentMAS is a multi-agent reasoning framework that moves agent collaboration from token space into the model’s latent space.
Instead of producing long textual reasoning traces, agents communicate by passing latent thoughts through their own working memory. LatentMAS has the following key features:

Efficient multi-step reasoning with drastically fewer tokens
Training-free latent-space alignment for stable generation
A general technique compatible with any HF model and optionally vLLM backends.

Overall, LatentMAS achieves superior performance, lower token usage, and major wall-clock speedups of multi-agent system.

🔔 News

[2025-11-25] We have released our paper and code implementations for LatentMAS! Stay tuned for more model-backbone supports and advanced features!
[2025-11-25] We are featured as 🤗 HuggingFace 1st Paper of the Day!

📊 Experiments Overview

⭐ Main Results

Three main tables from our paper spanning 9 tasks across math & science reasoning, commensonse reasoning, and code generation:

Table 1 — LatentMAS under the Sequantial MAS setting
Table 2 — LatentMAS under the Hierarchical MAS setting
Table 3 — Main Results on Reasoning Intensive Tasks

⚡ Superior Efficiency on Time and Tokens

Overall, LatentMAS reduces:

~50–80% tokens
~3×–7× wall-clock time compared to standard Text-MAS or chain-of-thought baselines.

🛠️ Getting Started

This repository provides all code for reproducing LatentMAS, TextMAS, and baseline single-agent experiments across GSM8K, AIME24/25, GPQA, ARC-Easy/Challenge, MBPP+, HumanEval+, and MedQA.

⚙️ Setup Environment Variables

We recommend setting your HF cache directory to avoid repeated downloads:

export HF_HOME=/path/to/huggingface
export TRANSFORMERS_CACHE=$HF_HOME
export HF_DATASETS_CACHE=$HF_HOME

Models and datasets will automatically be downloaded into $HF_HOME.

📦 Install Packages

conda create -n latentmas python=3.10 -y
conda activate latentmas

pip install -r requirements.txt

If you want vLLM support, also install:

pip install vllm

🚀 Quick Start

1. Clone the repo

git clone https://github.com/YourRepo/LatentMAS.git
cd LatentMAS

2. Repository Structure

LatentMAS/
│── run.py                 # Main entry for experiments
│── models.py              # Wrapper for HF + vLLM + latent realignment
│── methods/
│   ├── baseline.py        # Single-agent baseline
│   ├── text_mas.py        # Token-space multi-agent method
│   └── latent_mas.py      # Latent-space multi-agent (our method)
│── prompts.py             # Prompt constructors
│── data.py                # Dataset loaders
│── data/                  # Provided data + figures (We give medqa.json as an example here)
│── utils.py               # Answer parsing / timeout / helpers
│── example_logs/          # Example logs from LatentMAS
│── requirements.txt

🧪 Running Experiments (standard HF backend)

🔹 Baseline (single model)

python run.py --method baseline --model_name Qwen/Qwen3-14B --task gsm8k --max_samples -1

🔹 TextMAS (text based multi-agent system)

python run.py --method text_mas --model_name Qwen/Qwen3-14B --task gsm8k --prompt sequential --max_samples -1

🔹 LatentMAS (our latent mas method)

python run.py --method latent_mas --model_name Qwen/Qwen3-14B --task gsm8k --prompt sequential --max_samples -1

Notes:

--latent_steps ∈ [0, 80] Tune for best performance — typically 20–40 works well.
--latent_space_realign Enables latent→embedding alignment We treat this as a hyperparameter — enable/disable depending on task/model:

python run.py --method latent_mas --model_name Qwen/Qwen3-14B --task gsm8k --prompt sequential --max_samples -1 --latent_space_realign

📘 Example Logs

Two example LatentMAS logs are provided for reference purposes:

example_logs/qwen3_14b_mbppplus_sequential.txt
example_logs/qwen3_14b_humanevalplus_hierarchical.txt

Please refer to additional experiment logs here. You can open them to view the full agent interaction traces and outputs.

⚡ vLLM Integration

LatentMAS supports vLLM for faster inference.

🔹 Baseline with vLLM

python run.py --method baseline --model_name Qwen/Qwen3-14B --task gsm8k --max_samples -1 --use_vllm

🔹 TextMAS with vLLM

python run.py --method text_mas --model_name Qwen/Qwen3-14B --task gsm8k --prompt sequential --max_samples -1 --use_vllm

🔹 LatentMAS with vLLM

LatentMAS supports a hybrid HF + vLLM pipeline for fast inference:

vLLM handles final text generation (with prefix caching, tensor parallelism, etc.)
A HuggingFace model handles latent-space rollout and hidden-state alignment

For this setup, we recommend using two GPUs:

One GPU for vLLM (--device, e.g., cuda:0)
One GPU for the auxiliary HF model (--device2, e.g., cuda:1)

CUDA_VISIBLE_DEVICES=0,1 python run.py --method latent_mas --model_name Qwen/Qwen3-14B --task gsm8k --prompt sequential --max_samples -1 \
  --use_vllm \
  --use_second_HF_model \
  --enable_prefix_caching \
  --device2 cuda:1

📍Important Note:

vLLM does not officially support modifying KV-cache or prompting via latent embeddings. We modify the partial inner package inside vLLM backend for our method implementation. Note minor numeric differences may arise compared to offical HF backend due to different decoding (generation) strategies. Please Use the HF backend to reproduce the official published results.

📚 Citation

💫 If you find LatentMAS helpful, please kindly give us a star ⭐️ and cite below. Thanks!

@article{zou2025latentmas,
  title={Latent Collaboration in Multi-Agent Systems},
  author={Zou, Jiaru and Yang, Xiyuan and Qiu, Ruizhong and Li, Gaotang and Tieu, Katherine and Lu, Pan and Shen, Ke and Tong, Hanghang and Choi, Yejin and He, Jingrui and Zou, James and Wang, Mengdi and Yang, Ling},
  journal={arXiv preprint arXiv:2511.20639},
  year={2025}
}

🤝 Ackowledgement

This code is partially based on the amazing work of vLLM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Latent Collaboration in Multi-Agent Systems

💡 Introduction

🔔 News

📊 Experiments Overview

⭐ Main Results

⚡ Superior Efficiency on Time and Tokens

🛠️ Getting Started

⚙️ Setup Environment Variables

📦 Install Packages

🚀 Quick Start

1. Clone the repo

2. Repository Structure

🧪 Running Experiments (standard HF backend)

🔹 Baseline (single model)

🔹 TextMAS (text based multi-agent system)

🔹 LatentMAS (our latent mas method)

Notes:

📘 Example Logs

⚡ vLLM Integration

🔹 Baseline with vLLM

🔹 TextMAS with vLLM

🔹 LatentMAS with vLLM

📚 Citation

🤝 Ackowledgement

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
data		data
example_logs		example_logs
methods		methods
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
data.py		data.py
models.py		models.py
prompts.py		prompts.py
requirements.txt		requirements.txt
run.py		run.py
utils.py		utils.py

License

missflash/LatentMAS

Folders and files

Latest commit

History

Repository files navigation

Latent Collaboration in Multi-Agent Systems

💡 Introduction

🔔 News

📊 Experiments Overview

⭐ Main Results

⚡ Superior Efficiency on Time and Tokens

🛠️ Getting Started

⚙️ Setup Environment Variables

📦 Install Packages

🚀 Quick Start

1. Clone the repo

2. Repository Structure

🧪 Running Experiments (standard HF backend)

🔹 Baseline (single model)

🔹 TextMAS (text based multi-agent system)

🔹 LatentMAS (our latent mas method)

Notes:

📘 Example Logs

⚡ vLLM Integration

🔹 Baseline with vLLM

🔹 TextMAS with vLLM

🔹 LatentMAS with vLLM

📚 Citation

🤝 Ackowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages