Skip to content

Conversation

@cdreetz
Copy link
Collaborator

@cdreetz cdreetz commented Jan 8, 2026

Description

New experimental remote_env and ts_env.
Corresponding prime_cli PR

RemoteEnv is the base of a new pattern that enables people to create environments without having to write python. The base is an extension of SandboxEnv with a couple extra steps. Extra steps being after a rollout creates a sandbox, it will download the contents of the environments sandbox/ folder from the hub and runs the expected setup.sh that is expected to include the install commands for any required dependencies, and the final line of the script should do something like run the server in the sandbox that the user wrote in their chosen language.

Then TypeScriptEnv is an extension of RemoteEnv that includes the necessary typescript specific files that a typescript user should be familiar with, and allows them to build an environment by editing the index.ts and setup.sh

image

The expected user flow is:

  1. prime env init my-ts-env --ts
  2. cd environments/my_ts_env/sandbox/
  3. edit setup.sh to download any dependencies
  4. edit src/index.ts to have whatever tools and reward functions the environment needs
  5. then prime env push so that everything including sandbox/ are on hub
  6. uv run vf-eval my-ts-env then creates sandboxes, downloads sandbox files, runs sandbox setup.sh
  7. rollout "orchestrator" then gets the tools and rewards from the sandbox through discoverability endpoints then uses tools and reward functions like it natively would

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • [] I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Introduces a remote environment pattern and a TypeScript specialization for tool/reward discovery and execution.

  • Adds RemoteEnv that extends SandboxEnv to fetch environment tarballs from the Prime API, extract into upload_path, and run sandbox/setup.sh after sandbox startup (supports owner/name@version and optional API key)
  • Adds TypeScriptEnv that waits for a server (default :3000), discovers tools/rewards via GET /tools and GET /rewards, registers them as oai_tools and a Rubric, and routes tool calls (POST /tools/{name}) and reward scoring (POST /rewards/{name})
  • Introduces RemoteToolWrapper and RemoteRewardRubric to wrap remote endpoints as local tools and reward functions
  • Exposes RemoteEnv and TypeScriptEnv via verifiers/envs/experimental/remote_envs/__init__.py

Written by Cursor Bugbot for commit 305e90a. This will update automatically on new commits. Configure here.

reward_specs = await self._discover_rewards(sandbox_id)
self._register_rewards(reward_specs)

self._tools_discovered = True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition causes duplicate tool registration

High Severity

The _tools_discovered check-then-act pattern is not concurrency-safe. When multiple rollouts run concurrently via asyncio.gather, multiple coroutines can pass the if not self._tools_discovered: check before any sets it to True. Each then calls _register_tools, which appends to self.tools and self.oai_tools, causing duplicate tool entries. Other similar patterns in this codebase use asyncio.Lock() to prevent this issue.

Additional Locations (1)

Fix in Cursor Fix in Web

raise RuntimeError(f"Reward {reward_name} failed: {result.stderr}")

data = json.loads(result.stdout)
return float(data["score"])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remote rewards fail because sandbox destroyed before scoring

High Severity

The RemoteRewardRubric calculates rewards by executing curl commands inside the sandbox via _call_remote_reward. However, the framework's execution order runs @vf.cleanup (which destroys the sandbox) at the end of each rollout, and scoring happens AFTER all rollouts complete. By the time reward functions are invoked during score_group, the sandbox has already been deleted. The parent class SandboxEnv provides a post_rollout hook specifically for extracting reward data before sandbox destruction, but TypeScriptEnv doesn't override it to pre-calculate rewards.

Additional Locations (1)

Fix in Cursor Fix in Web

tar.extractall("{self.upload_path}")
os.remove("/tmp/env.tar.gz")
print("Download and extraction complete")
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unescaped string interpolation can break download script

Low Severity

The package_url and upload_path values are directly interpolated into Python code using f-strings without escaping. If either value contains quote characters (particularly double quotes), the generated Python script will have invalid syntax and fail to execute. While the default upload_path is safe and API-provided URLs typically don't contain quotes, this could cause hard-to-debug failures in edge cases.

Fix in Cursor Fix in Web

)

if result.exit_code != 0:
raise RuntimeError(f"Failed to download environment: {result.stderr}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-vf.Error exceptions cause sandbox resource leaks

High Severity

The new code raises RuntimeError, TimeoutError, and ValueError which are not vf.Error subclasses. The rollout method in MultiTurnEnv only catches vf.Error exceptions during setup_state. When these exceptions occur after the sandbox is created (by the parent SandboxEnv.setup_state), they escape the error handling and the _cleanup handlers including destroy_sandbox are never called. This leaves orphaned sandboxes that are never deleted. Other sandbox-based environments like PythonEnv define their errors as vf.SandboxError subclasses to ensure proper cleanup.

Additional Locations (2)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants