This repository contains code released for the paper NoRML: No-Reward Meta Learning.
First, install all dependencies by
pip install -r norml/requirements.txt
The HalfCheetah environment requires Mujoco, so make sure you also followed the proper instructions to install mujoco and mujoco-py.
Example checkpoints are stored in Google Cloud Storage and can be downloaded by running:
gsutil cp -r gs://gresearch/norml/example_checkpoints/ .
You can start training from scratch by
python -m norml.train_maml --config MOVE_POINT_ROTATE_MAML --logs maml_checkpoints
Where config should be one of the configs defined in config_maml.py. The config string is of the type {ENV_NAME}_{ALG_NAME}, where ENV_NAME is one of MOVE_POINT_ROTATE, MOVE_POINT_ROTATE_SPARSE, CARTPOLE_SENSOR, HALFCHEETAH_MOTOR and ALG_NAME is one of DR, MAML, MAML_OFFSET, MAML_LAF, NORML as mentioned in the paper.
MOVE_POINT_ROTATE are fast to train and can converge within minutes. Training MOVE_POINT_ROTATE_SPARSE and CARTPOLE_SENSOR can take as long as a day. The Halfcheetah training was done via parallelized workers on a cloud server, and can take a long time on a single machine.
We also provide a convenient script to evaluate the training performance:
python -m norml.eval_maml \
--model_dir norml/example_checkpoints/move_point_rotate_sparse/norml/all_weights.ckpt-991 \
--output_dir maml_eval_results \
--render=True \
--num_finetune_steps 1 \
--test_task_index 0 \
--eval_finetune=True
You should be able to see states/actions logs and an optional rendered video in the maml_eval_results folder.
If you use this code in your research, please cite the following paper:
Yang, Y., Caluwaerts, K., Iscen, A., Tan, J. & Finn, C. (2019). NoRML: No-Reward Meta Learning.
@article{yang2019norml,
title={NoRML: No-Reward Meta Learning},
author={Yang, Yuxiang and Caluwaerts, Ken and Iscen, Atil and Tan, Jie and Finn, Chelsea},
journal={arXiv preprint arXiv:1903.01063},
year={2019}
}
Disclaimer: This is not an official Google product.