Multi-Agent Perceiver12 Critic for Robotic Warehouse (RWARE) Tasks implemented in pymarlzooplus3.
See instructions for Docker here. Alternatively, create a conda environment with python 3.11
conda create -n map python=3.11
conda activate mapInstall project (see common installation issues for pymarlzooplus here)
pip install -e .Launch a training run (modify the config args in the command or modify them in the config files in the config directory)
python pymarlzooplus/main.py \
--config=map_dec \
--env-config=gymma \
with \
env_args.time_limit=500 \
env_args.key="rware:rware-tiny-4ag-hard-v1" \
env_args.seed=742
# env_args.key options:
# [rware:rware-small-4ag-hard-v1, rware:rware-tiny-4ag-hard-v1, rware:rware-tiny-2ag-hard-v1]Evaluate a saved checkpoint
python pymarlzooplus/main.py \
--config=map_dec \
--env-config=gymma \
with \
env_args.key="rware:rware-tiny-4ag-hard-v1" \
env_args.time_limit=500 \
checkpoint_path="pymarlzooplus/results/sacred/map_dec/rware:rware-tiny-4ag-hard-v1/1/models" \
evaluate=True \
load_step=100000 \
test_nepisode=100
# load_step: Load model trained on this many timesteps (0 if choose max possible)
# checkpoint_path: Load model from this pathTo see a trained policy in action, run
python pymarlzooplus/main.py \
--config=map_dec \
--env-config=gymma \
with \
env_args.key="rware:rware-tiny-4ag-hard-v1" \
env_args.time_limit=500 \
checkpoint_path="pymarlzooplus/results/sacred/map_dec/rware:rware-tiny-4ag-hard-v1/0/models" \
load_step=0 \
evaluate=True render=True render_sleep_time=0.4
# render_sleep_time: sleep time between renders (only when render == True)
# load_step: Load model trained on this many timesteps (0 if choose max possible)
# checkpoint_path: Load model from this pathSingle seed runs w/ environment time_limit=500.
| Task | tiny-2ag-hard | tiny-4ag-hard | small-4ag-hard |
|---|---|---|---|
| Mean Episodic Return | 17.07 ± 4.62 | 41.74 ± 5.00 | 20.06 ± 3.71 |
| Configuration | link | link | link |
This implementation is largely adapted from the following repos:
perceiver-pytorch: for Perceiver IO implementation
pymarlzooplus: for training/benchmarking
Footnotes
-
Jaegle, A., Gimeno, F., Brock, A., Zisserman, A., Vinyals, O., & Carreira, J. (2021). Perceiver: General Perception with Iterative Attention. arXiv:2103.03206 ↩
-
Jaegle, A., Borgeaud, S., Alayrac, J.-B., Doersch, C., Ionescu, C., Ding, D., Koppula, S., Zoran, D., Brock, A., Shelhamer, E., Hénaff, O., Botvinick, M. M., Zisserman, A., Vinyals, O., & Carreira, J. (2022). Perceiver IO: A General Architecture for Structured Inputs & Outputs. arXiv:2107.14795 ↩
-
Papadopoulos, G., Kontogiannis, A., Papadopoulou, F., Poulianou, C., Koumentis, I., & Vouros, G. (2025). An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative Tasks. arXiv:2502.04773 ↩