-
Notifications
You must be signed in to change notification settings - Fork 470
New Remote Env / TS Env #703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| reward_specs = await self._discover_rewards(sandbox_id) | ||
| self._register_rewards(reward_specs) | ||
|
|
||
| self._tools_discovered = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition causes duplicate tool registration
High Severity
The _tools_discovered check-then-act pattern is not concurrency-safe. When multiple rollouts run concurrently via asyncio.gather, multiple coroutines can pass the if not self._tools_discovered: check before any sets it to True. Each then calls _register_tools, which appends to self.tools and self.oai_tools, causing duplicate tool entries. Other similar patterns in this codebase use asyncio.Lock() to prevent this issue.
Additional Locations (1)
| raise RuntimeError(f"Reward {reward_name} failed: {result.stderr}") | ||
|
|
||
| data = json.loads(result.stdout) | ||
| return float(data["score"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remote rewards fail because sandbox destroyed before scoring
High Severity
The RemoteRewardRubric calculates rewards by executing curl commands inside the sandbox via _call_remote_reward. However, the framework's execution order runs @vf.cleanup (which destroys the sandbox) at the end of each rollout, and scoring happens AFTER all rollouts complete. By the time reward functions are invoked during score_group, the sandbox has already been deleted. The parent class SandboxEnv provides a post_rollout hook specifically for extracting reward data before sandbox destruction, but TypeScriptEnv doesn't override it to pre-calculate rewards.
Additional Locations (1)
| tar.extractall("{self.upload_path}") | ||
| os.remove("/tmp/env.tar.gz") | ||
| print("Download and extraction complete") | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unescaped string interpolation can break download script
Low Severity
The package_url and upload_path values are directly interpolated into Python code using f-strings without escaping. If either value contains quote characters (particularly double quotes), the generated Python script will have invalid syntax and fail to execute. While the default upload_path is safe and API-provided URLs typically don't contain quotes, this could cause hard-to-debug failures in edge cases.
| ) | ||
|
|
||
| if result.exit_code != 0: | ||
| raise RuntimeError(f"Failed to download environment: {result.stderr}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-vf.Error exceptions cause sandbox resource leaks
High Severity
The new code raises RuntimeError, TimeoutError, and ValueError which are not vf.Error subclasses. The rollout method in MultiTurnEnv only catches vf.Error exceptions during setup_state. When these exceptions occur after the sandbox is created (by the parent SandboxEnv.setup_state), they escape the error handling and the _cleanup handlers including destroy_sandbox are never called. This leaves orphaned sandboxes that are never deleted. Other sandbox-based environments like PythonEnv define their errors as vf.SandboxError subclasses to ensure proper cleanup.
Description
New experimental remote_env and ts_env.
Corresponding prime_cli PR
RemoteEnv is the base of a new pattern that enables people to create environments without having to write python. The base is an extension of SandboxEnv with a couple extra steps. Extra steps being after a rollout creates a sandbox, it will download the contents of the environments
sandbox/folder from the hub and runs the expected setup.sh that is expected to include the install commands for any required dependencies, and the final line of the script should do something like run the server in the sandbox that the user wrote in their chosen language.Then TypeScriptEnv is an extension of RemoteEnv that includes the necessary typescript specific files that a typescript user should be familiar with, and allows them to build an environment by editing the index.ts and setup.sh
The expected user flow is:
prime env init my-ts-env --tscd environments/my_ts_env/sandbox/setup.shto download any dependenciessrc/index.tsto have whatever tools and reward functions the environment needsprime env pushso that everything includingsandbox/are on hubuv run vf-eval my-ts-envthen creates sandboxes, downloads sandbox files, runs sandbox setup.shType of Change
Testing
uv run pytestlocally.Checklist
Additional Notes
Note
Introduces a remote environment pattern and a TypeScript specialization for tool/reward discovery and execution.
RemoteEnvthat extendsSandboxEnvto fetch environment tarballs from the Prime API, extract intoupload_path, and runsandbox/setup.shafter sandbox startup (supportsowner/name@versionand optional API key)TypeScriptEnvthat waits for a server (default:3000), discoverstools/rewardsviaGET /toolsandGET /rewards, registers them asoai_toolsand aRubric, and routes tool calls (POST /tools/{name}) and reward scoring (POST /rewards/{name})RemoteToolWrapperandRemoteRewardRubricto wrap remote endpoints as local tools and reward functionsRemoteEnvandTypeScriptEnvviaverifiers/envs/experimental/remote_envs/__init__.pyWritten by Cursor Bugbot for commit 305e90a. This will update automatically on new commits. Configure here.