Ultra Sparse Memory Network

Ultra sparse memory network (UltraMem) is a sparse model similar with MoE, but with significantly low memory access, so the inference is fast. but UltraMem have only matched the performance of 2-expert MoE models, falling significantly short of state-of-the-art 8-expert configurations. We present UltraMemV2, a redesigned memory-layer architecture that closes this performance gap.

Overall structure of 3 sparse layers

Flow of Tucker Decomposed Query-Key Retrieval(TDQKR)

Comprehensive performance comparison across different open sourced models and benchmarks

Installation

Please follow OLMoE https://github.com/allenai/OLMoE

Training

Key files

olmo/
├── models.py                    # main model
├── memory_plus_layer.py         # Memory+ layer (by meta)
├── ultramem_layer.py            # UltraMem layer
└── ultramem_layer_v2.py         # UltraMemV2 layer

Dataset

We use the same dataset as OLMoE. You can download it from here. Then you need use the following command to change the dataset to the format that our model can use:

dolma tokens \
--documents ${PATH_TO_DOWNLOADED_DATA} \
--destination ${PATH_WHERE_TO_SAVE_TOKENIZED_DATA} \
--tokenizer.name_or_path 'allenai/gpt-neox-olmo-dolma-v1_5' \
--max_size '2_147_483_648' \
--seed 0 \
--tokenizer.eos_token_id 50279 \
--tokenizer.pad_token_id 1 \
--processes ${NUMBER_OF_CPU_CORES_TO_USE}

Training

You can run the training script below, which is a 227M/1.2B UltraMemV2. Noticed that we only implement DDP with fused kernel, it's easy to extend to megatron with some synchronization and communication.

sh launch.sh ${CONFIG_PATH} \
--save_folder=${SAVE_DIR} \
--run_name=${run_name} \
--save_overwrite=true \
--mount_common_hdfs=true \
--fsdp.sharding_strategy=NO_SHARD \
--canceled_check_interval=9999999 \
--global_indices_file=${CODE_DIR}/global_indices.npy \
--load_path=${CUR_CKPT_PATH} \
--model.init_std=0.02282177322938192 \
--model.init_fn="full_megatron" \
--model.d_model=768 \
--model.n_layers=20 \
--model.n_heads=12 \
--model.n_kv_heads=12 \
--model.weight_tying=true \
--max_duration=5e11T \
--scheduler.t_warmup=1e10 \
--scheduler.t_max=5e11 \
--device_train_microbatch_size=3 \
--global_train_batch_size=768 \
--save_interval=1000 \
--eval_interval=1000 \
--save_num_checkpoints_to_keep=-1 \
--console_log_interval=10 \
\
--model.block_type='sequential' \
--model.mlp_hidden_size=4992 \
\
--optimizer.mem_value_lr_times=4.0 \
--optimizer.mem_value_lr_max_steps_rate=1.0 \
--model.mem_insert_way='full' \
--model.mem_knum=360 \
--model.mem_kdim=192 \
--model.mem_vdim=192 \
--model.mem_pre_vdim=192 \
--model.mem_knn=32 \
--model.mem_head=1 \
--model.mem_share_ratio=0.5 \
--model.mem_type='ultramem_v2' \
--model.mem_value_expand_time=1 \
\
--distributed_strategy=ddp \
--model.init_device='cuda' \
--optimizer.metrics_log_interval=50 \
--model.mem_log_interval=50 \
--save_interval_unsharded=1000

You can also pick the code to your own project, it should be easy to run : ）

Citing

If you find this work helpful or use it in your research, please consider citing our paper:

@article{huang2025ultra,
  title={UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning},
  author={Zihao Huang, Yu Bao, Qiyang Min, Siyan Chen, Ran Guo, Hongzhi Huang, Defa Zhu, Yutao Zeng, Banggu Wu, Xun Zhou, Siyuan Qiao},
  journal={Arxiv},
  year={2025}
}

@article{huang2024ultra,
  title={Ultra-Sparse Memory Network},
  author={Huang, Zihao and Min, Qiyang and Huang, Hongzhi and Zhu, Defa and Zeng, Yutao and Guo, Ran and Zhou, Xun},
  journal={ICLR 2025},
  year={2024}
}

Acknowledgement

Our open-soursed work is mainly based on OLMoE, thanks for their work.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
checkpoints/official		checkpoints/official
configs		configs
docker		docker
docs		docs
evaluation		evaluation
fuse_ops		fuse_ops
hf_olmo		hf_olmo
inference		inference
olmo		olmo
olmo_data		olmo_data
scripts		scripts
test_fixtures		test_fixtures
tests		tests
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
apex_setup.py		apex_setup.py
conftest.py		conftest.py
launch.sh		launch.sh
pyproject.toml		pyproject.toml
run.sh		run.sh
run_memory_plus.sh		run_memory_plus.sh
run_moe.sh		run_moe.sh
run_ultramem_v1.sh		run_ultramem_v1.sh
run_ultramem_v2.sh		run_ultramem_v2.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ultra Sparse Memory Network

Installation

Training

Citing

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

ZihaoHuang-notabot/Ultra-Sparse-Memory-Network

Folders and files

Latest commit

History

Repository files navigation

Ultra Sparse Memory Network

Installation

Training

Citing

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages