|
| 1 | +.. _inference-steve: |
| 2 | + |
| 3 | +Tutorial: Inference with STEVE-1 |
| 4 | +--------------------------------- |
| 5 | + |
| 6 | +To inference with STEVE-1, you first need to download pretrained checkpoints. |
| 7 | +The example code is provided in ``minestudio/tutorials/inference/evaluate_steve/main.py``. |
| 8 | + |
| 9 | +.. dropdown:: Evaluating STEVE-1 |
| 10 | + |
| 11 | + .. code-block:: python |
| 12 | +
|
| 13 | + from minestudio.simulator.callbacks import MinecraftCallback |
| 14 | + from minestudio.models import SteveOnePolicy |
| 15 | + from minestudio.simulator import MinecraftSim |
| 16 | + from minestudio.simulator.callbacks import SpeedTestCallback, load_callbacks_from_config |
| 17 | + from minestudio.inference import EpisodePipeline, MineGenerator, InfoBaseFilter |
| 18 | + from minestudio.benchmark import prepare_task_configs |
| 19 | +
|
| 20 | + import ray |
| 21 | + from functools import partial |
| 22 | + from rich import print |
| 23 | +
|
| 24 | + class CommandCallback(MinecraftCallback): |
| 25 | + """ |
| 26 | + To use SteveOnePolicy, you need to contain a condition in the observation. |
| 27 | + """ |
| 28 | + def __init__(self, command, cond_scale = 4.0): |
| 29 | + self.command = command |
| 30 | + self.cond_scale = cond_scale |
| 31 | +
|
| 32 | + def after_reset(self, sim, obs, info): |
| 33 | + self.timestep = 0 |
| 34 | + obs["condition"] = { |
| 35 | + "cond_scale": self.cond_scale, |
| 36 | + "text": self.command |
| 37 | + } |
| 38 | + return obs, info |
| 39 | + |
| 40 | + def after_step(self, sim, obs, reward, terminated, truncated, info): |
| 41 | + obs["condition"] = { |
| 42 | + "cond_scale": self.cond_scale, |
| 43 | + "text": self.command |
| 44 | + } |
| 45 | + return obs, reward, terminated, truncated, info |
| 46 | +
|
| 47 | +
|
| 48 | + if __name__ == '__main__': |
| 49 | + ray.init() |
| 50 | + task_configs = prepare_task_configs("simple") |
| 51 | + config_file = task_configs["collect_wood"] |
| 52 | + # you can try: survive_plant, collect_wood, build_pillar, ... ; make sure the config file contains `reference_video` field |
| 53 | + print(config_file) |
| 54 | +
|
| 55 | + env_generator = partial( |
| 56 | + MinecraftSim, |
| 57 | + obs_size = (224, 224), |
| 58 | + preferred_spawn_biome = "forest", |
| 59 | + callbacks = [ |
| 60 | + SpeedTestCallback(50), |
| 61 | + CommandCallback("mine log", cond_scale=4.0), # Add a command callback for SteveOnePolicy |
| 62 | + ] + load_callbacks_from_config(config_file) |
| 63 | + ) |
| 64 | +
|
| 65 | + agent_generator = lambda: SteveOnePolicy.from_pretrained("CraftJarvis/MineStudio_STEVE-1.official") |
| 66 | +
|
| 67 | + worker_kwargs = dict( |
| 68 | + env_generator=env_generator, |
| 69 | + agent_generator=agent_generator, |
| 70 | + num_max_steps=600, |
| 71 | + num_episodes=1, |
| 72 | + tmpdir="./output", |
| 73 | + image_media="h264", |
| 74 | + ) |
| 75 | +
|
| 76 | + pipeline = EpisodePipeline( |
| 77 | + episode_generator=MineGenerator( |
| 78 | + num_workers=1, |
| 79 | + num_gpus=0.25, |
| 80 | + max_restarts=3, |
| 81 | + **worker_kwargs, |
| 82 | + ), |
| 83 | + episode_filter=InfoBaseFilter( |
| 84 | + key="mine_block", |
| 85 | + regex=".*log.*", |
| 86 | + num=1, |
| 87 | + ), |
| 88 | + ) |
| 89 | + summary = pipeline.run() |
| 90 | + print(summary) |
| 91 | +
|
| 92 | +Since STEVE-1 is a text-conditioned policy, we need to provide textual commands to guide the agent's behavior. |
| 93 | +Supported tasks and configs can be found in ``minestudio/benchmark/task_configs`` and a detailed explanation can be found in the benchmarking tutorial. |
| 94 | + |
| 95 | +To pass text commands to STEVE-1, we implement a ``CommandCallback`` for the environment. |
| 96 | +The ``CommandCallback`` adds a condition field to the observation that contains: |
| 97 | + - ``cond_scale``: A scaling factor for the conditioning (default: 4.0) |
| 98 | + - ``text``: The textual command describing the desired behavior |
| 99 | + |
| 100 | +After the environment is initialized, the text command will be passed to the ``'condition'`` field of the observation and then be used to guide the agent's actions. |
| 101 | +The command is applied to every observation throughout the episode, providing consistent guidance to the agent. |
| 102 | + |
| 103 | +For the inference pipeline parameters, we need to specify: |
| 104 | + - task, configs and text command for the ``env_generator``. |
| 105 | + - pretrained checkpoint for the ``agent_generator``. |
| 106 | + - rollout steps, number of episodes, output path for ``worker_kwargs``. |
| 107 | + - number of gpus and workers for ``MineGenerator``. |
| 108 | + - An ``episode_filter`` to filter the episode based on the key and value of the observation. |
| 109 | + |
| 110 | +In the above example, we test the STEVE-1 model on the task of collecting wood with the command "mine log" and 1 episode with 600 steps. |
| 111 | +1 worker is used with 0.25 GPU per worker. |
| 112 | +The episode will be filtered based on the key ``mine_block`` and regex pattern ``.*log.*``. |
| 113 | + |
| 114 | +For common text commands for different tasks, you should refer to the original STEVE-1 paper [1]_. |
| 115 | + |
| 116 | +The conditioning scale (``cond_scale``) controls how strongly the text command influences the agent's behavior: |
| 117 | + - Higher values (e.g., 6.0-8.0) make the agent follow commands more strictly |
| 118 | + - Lower values (e.g., 2.0-4.0) allow more exploration while still following the general command |
| 119 | + - The default value of 4.0 provides a good balance for most tasks |
| 120 | + |
| 121 | +The summary of the pipeline will be printed to the console, showing the success rate and the number of episodes. |
| 122 | +After the pipeline is finished, the console will print the summary of the pipeline like the following: |
| 123 | + |
| 124 | +.. code-block:: python |
| 125 | +
|
| 126 | + ... |
| 127 | +
|
| 128 | + (Worker pid=922019) Episode 0 saved at output/episode_0.mp4 |
| 129 | + (Worker pid=922019) Speed Test Status: |
| 130 | + (Worker pid=922019) Average Time: 0.04 |
| 131 | + (Worker pid=922019) Average FPS: 24.28 |
| 132 | + (Worker pid=922019) Total Steps: 600 |
| 133 | + {'num_yes': 1, 'num_episodes': 1, 'yes_rate': '100.00%'} |
| 134 | +
|
| 135 | +.. [1] Lifshitz S, Paster K, Chan H, et al. Steve-1: A generative model for text-to-behavior in minecraft[J]. Advances in Neural Information Processing Systems, 2024, 36. |
0 commit comments