We aim to tackle the video navigation problem, whose goal is to train an in-context policy to find objects included in the context video in a new scene.
After watching an 30-second egocentric video, an agent is expected to reason how to reach the target objetct specified by a goal image. Please refer to our for videos of real-world deployment.

conda create -n nolo python=3.9
conda activate nolo
cd nolo
pip install -r docs/requirements.txtRefer to RoboTHOR to install RoboTHOR and Habitat-sim to install Habitat simulator.
SuperGlue: https://github.com/magicleap/SuperGluePretrainedNetwork. Place the downloaded checkpoints in scripts/superglue/weights.
GMFlow: https://github.com/haofeixu/gmflow. Place the downloaded checkpoints in scripts/gmflow/gmflow-pretrained.
Detic: https://github.com/facebookresearch/Detic. Place the whole repository in scripts/Detic.
Refer to Habitat-lab to install Matterport3D datasets. Change the path in scripts/collect_habitat_all.py to where the dataset stores.
Domain can be chosen from 'robothor' or 'habitat'.
python -m scripts.collect_$domain$_allThe generated offline datasets will be in the following structure:
offline-dataset
├── robothor-dataset
│ ├── 900
│ │ ├── train
│ │ │ ├── FloorPlan1_1
│ │ │ ├── FloorPlan1_2
│ │ │ ├── ...
│ │ ├── val
│ │ │ ├── FloorPlan1_5
│ │ │ ├── FloorPlan2_5
│ │ │ ├── ...
├── mp3d-dataset
│ ├── 900
│ │ ├── train
│ │ │ ├── 1LXtFkjw3qL
│ │ │ ├── ...
│ │ ├── val
│ │ │ ├── 2azQ1b91cZZ
│ │ │ ├── ...
Train a VN-Bert policy using BCQ in 'robothor' or 'habitat'.
python -m recbert_policy.train_vnbert --exp_name bcq_rank_0.5_9_SA --domain $domain$- Evaluate Random policy in
'robothor' or 'habitat':
bash bash/eval_$domain$_random.sh- Evaluate baseline LMM policy
'gpt4o' or 'videollava'in'robothor' or 'habitat'.
bash bash/eval_$domain$_baseline.sh $baseline$ Notice to provide a API-KEY if use gpt4o for evaluation.
- Evaluate VN-Bert policy (NOLO) in
'robothor' or 'habitat'. Ablation variants and cross-domain evaluation are also supported.
bash bash/eval_habitat_policymode.sh "nolo-bert" $checkpoint_path$ "Q" "SA"- Collect random RGB and action sequence
python scripts/collect_maze.py- Decode actions from recorded video:
python scripts/label_maze.py- Train a policy using BCQ in real world maze environment.
python -m recbert_policy.train_vnbert_real --exp_name maze- Evaluate the trained policy
python -m scripts.inference_maze_transformer