-
Notifications
You must be signed in to change notification settings - Fork 431
[feat] Add DaytonaRunner for code evaluators
#3258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: frontend-feat/new-testsets-integration
Are you sure you want to change the base?
[feat] Add DaytonaRunner for code evaluators
#3258
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements and tests Daytona-based code evaluation functionality, transitioning from the legacy local sandbox to a new SDK-based approach. It includes improvements to code editor indentation handling for Python/code blocks and adds example evaluators for testing various dependencies and API endpoints.
Key Changes
- Replaced legacy
custom_code_runwith newsdk_custom_code_runthat uses the SDK's workflow-based evaluator system - Enhanced code editor to preserve exact indentation for Python/code (no transformations) while maintaining space-to-tab conversion for JSON/YAML
- Added example evaluators for testing OpenAI, NumPy, and Agenta API endpoints in Daytona environments
Reviewed changes
Copilot reviewed 20 out of 25 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
api/oss/src/services/evaluators_service.py |
Implements new SDK-based custom code runner function that delegates to workflow system |
api/oss/src/resources/evaluators/evaluators.py |
Updates default code template with deprecation note for app_params |
sdk/agenta/sdk/workflows/runners/daytona.py |
Adds environment variables (OPENAI_API_KEY, AGENTA_HOST, AGENTA_CREDENTIALS) to sandbox |
sdk/agenta/sdk/workflows/runners/local.py |
Exposes built-in Python types (dict, list, str, etc.) to restricted environment |
sdk/agenta/sdk/decorators/running.py |
Adds fallback to request.credentials in credential resolution chain |
web/oss/src/components/Editor/plugins/code/utils/pasteUtils.ts |
Preserves exact indentation for Python/code, converts spaces to tabs for JSON/YAML |
web/oss/src/components/Editor/plugins/code/plugins/IndentationPlugin.tsx |
Uses 4 spaces for Python/code tab insertion, 2 spaces for JSON/YAML |
web/oss/src/components/Editor/plugins/code/plugins/AutoFormatAndValidateOnPastePlugin.tsx |
Skips indentation transformation for Python/code, maintains it for JSON/YAML |
examples/python/evaluators/openai/*.py |
Adds OpenAI SDK evaluators for testing API availability and exact match comparisons |
examples/python/evaluators/numpy/*.py |
Adds NumPy evaluators for testing library availability and character counting |
examples/python/evaluators/basic/*.py |
Adds basic evaluators using Python stdlib for string matching, length checks, JSON validation |
examples/python/evaluators/ag/*.py |
Adds Agenta API endpoint evaluators for health, secrets, and config endpoints |
examples/python/evaluators/*.md |
Provides comprehensive documentation (README, QUICKSTART, SUMMARY) for evaluators |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ck-daytona-code-evaluator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 32 out of 37 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 182 out of 299 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| runtime = runtime or "python" | ||
|
|
||
| # Select general snapshot | ||
| snapshot_id = os.getenv("DAYTONA_SNAPSHOT") |
Copilot
AI
Dec 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The environment variable name changed from AGENTA_SERVICES_SANDBOX_SNAPSHOT_PYTHON to DAYTONA_SNAPSHOT, but this is inconsistent with the naming pattern used elsewhere (e.g., AGENTA_HOST, AGENTA_API_URL). Consider using AGENTA_DAYTONA_SNAPSHOT or documenting why the AGENTA_ prefix was dropped for this variable.
|
|
||
| def _run_file(daytona: Daytona, runtime: str, path: Path) -> None: | ||
| code = path.read_text(encoding="utf-8") | ||
| wrapped = _wrap_python(code) if runtime == "python" else _wrap_js(code) | ||
|
|
Copilot
AI
Dec 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sandbox creation doesn't specify a snapshot ID, but the _create_sandbox method in daytona.py requires DAYTONA_SNAPSHOT to be set. This will fail if the environment variable is not configured. Consider adding explicit snapshot configuration or error handling.
| def _run_file(daytona: Daytona, runtime: str, path: Path) -> None: | |
| code = path.read_text(encoding="utf-8") | |
| wrapped = _wrap_python(code) if runtime == "python" else _wrap_js(code) | |
| def _require_daytona_snapshot() -> str: | |
| """Ensure that DAYTONA_SNAPSHOT is configured before creating sandboxes.""" | |
| snapshot = os.getenv("DAYTONA_SNAPSHOT") | |
| if not snapshot: | |
| raise RuntimeError( | |
| "DAYTONA_SNAPSHOT is required to create Daytona sandboxes. " | |
| "Please set the environment variable to a valid snapshot ID." | |
| ) | |
| return snapshot | |
| def _run_file(daytona: Daytona, runtime: str, path: Path) -> None: | |
| code = path.read_text(encoding="utf-8") | |
| wrapped = _wrap_python(code) if runtime == "python" else _wrap_js(code) | |
| # Validate that the required snapshot configuration is present before creating a sandbox. | |
| _require_daytona_snapshot() |
| tracing_ctx = TracingContext.get() | ||
| tracing_ctx.credentials = credentials | ||
|
|
||
| with running_context_manager(RunningContext.get()): | ||
| running_ctx = RunningContext.get() | ||
| running_ctx.credentials = f"Secret {secret_token}" | ||
| ctx = RunningContext.get() | ||
| ctx.credentials = credentials | ||
|
|
||
| with tracing_context_manager(tracing_ctx): |
Copilot
AI
Dec 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context objects are retrieved and modified before being passed to context managers. This pattern could lead to issues if the contexts are modified elsewhere between get() and the context manager entry. Consider retrieving fresh contexts inside the managers or ensuring contexts are isolated.
| const response = await axios.post( | ||
| `${getAgentaApiUrl()}/testsets/revisions/${revisionId}/archive?project_id=${projectId}`, | ||
| ) |
Check failure
Code scanning / CodeQL
Server-side request forgery Critical test
URL
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 8 days ago
General fix: Ensure that the user-controlled revisionId is validated/normalized on the client before being interpolated into the URL path. Reject values that are not in an expected safe format (e.g., a UUID or a restricted ID pattern), and avoid letting path traversal sequences or reserved URL meta-characters be passed through. If invalid, throw or refuse to make the request.
Best concrete fix in this code: In web/oss/src/services/testsets/api/index.ts, in archiveTestsetRevision, validate revisionId before constructing the URL. A minimal and safe approach is:
- Introduce a small local validator (e.g.,
isSafeRevisionId) in this file that enforces a strict pattern (e.g., only letters, digits, hyphen, underscore, and limited length). - Call this validator at the top of
archiveTestsetRevision. If the ID is invalid, throw an error instead of making the HTTP request. - Use
encodeURIComponentwhen interpolatingrevisionIdinto the URL path, to prevent any unexpected interpretation of characters.
This keeps the current API shape and behavior for valid IDs, while making it impossible for a malicious query parameter to inject dangerous characters or path segments into the URL used by axios.post. No changes are necessary to the calling code in useTestcaseActions other than benefiting from the safer implementation.
Concretely:
- In
web/oss/src/services/testsets/api/index.ts, add a small helper functionisSafeRevisionIdnear thearchiveTestsetRevisionfunction. - In
archiveTestsetRevision, before usingrevisionId, checkif (!isSafeRevisionId(revisionId)) throw new Error("Invalid revision ID"). - When building the URL, wrap
revisionIdwithencodeURIComponent(revisionId).
No imports are needed; we only use built-in RegExp and encodeURIComponent.
-
Copy modified lines R400-R405 -
Copy modified lines R407-R410 -
Copy modified lines R413-R414 -
Copy modified line R416
| @@ -397,11 +397,23 @@ | ||
| * @param revisionId - The ID of the revision to archive | ||
| * @returns The archived revision data | ||
| */ | ||
| function isSafeRevisionId(revisionId: string): boolean { | ||
| // Allow only typical ID characters; adjust pattern if backend uses a stricter format (e.g., UUID) | ||
| // This prevents path traversal and other special characters from being used in the URL path segment. | ||
| return /^[A-Za-z0-9_-]{1,128}$/.test(revisionId) | ||
| } | ||
|
|
||
| export async function archiveTestsetRevision(revisionId: string) { | ||
| if (!isSafeRevisionId(revisionId)) { | ||
| throw new Error("Invalid revision ID") | ||
| } | ||
|
|
||
| const {projectId} = getProjectValues() | ||
|
|
||
| const safeRevisionId = encodeURIComponent(revisionId) | ||
|
|
||
| const response = await axios.post( | ||
| `${getAgentaApiUrl()}/testsets/revisions/${revisionId}/archive?project_id=${projectId}`, | ||
| `${getAgentaApiUrl()}/testsets/revisions/${safeRevisionId}/archive?project_id=${projectId}`, | ||
| ) | ||
|
|
||
| return response.data |
1562c7d to
59a6e6b
Compare
…k-daytona-code-evaluator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 183 out of 299 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from openai import AsyncOpenAI | ||
|
|
||
| # COMMENTED OUT: autoevals dependency removed | ||
| # from autoevals.ragas import Faithfulness, ContextRelevancy |
Copilot
AI
Dec 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'Relevancy' to 'Relevance' in the comment.
| # from autoevals.ragas import Faithfulness, ContextRelevancy | |
| # from autoevals.ragas import Faithfulness, ContextRelevancy # Commented out due to autoevals removal, corrected spelling of 'Relevance' |
| // Get the actual language from the CodeBlock node, or default to "code" | ||
| const language = $isCodeBlockNode(parentBlock) ? parentBlock.getLanguage() : "code" |
Copilot
AI
Dec 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback to 'code' when parentBlock is not a CodeBlockNode may mask errors. Consider logging a warning or throwing an error if the parent is unexpectedly not a CodeBlockNode, as this likely indicates a programming error.
| // Get the actual language from the CodeBlock node, or default to "code" | |
| const language = $isCodeBlockNode(parentBlock) ? parentBlock.getLanguage() : "code" | |
| // Get the actual language from the CodeBlock node, or default to "code". | |
| // If parentBlock is not a CodeBlockNode, log a warning as this likely indicates | |
| // a structural/editor bug, but still fall back to "code" to preserve behavior. | |
| let language: string | |
| if ($isCodeBlockNode(parentBlock)) { | |
| language = parentBlock.getLanguage() | |
| } else { | |
| log("Paste: Expected parentBlock to be a CodeBlockNode", { | |
| selection, | |
| anchorNode, | |
| currentLine, | |
| parentBlock, | |
| }) | |
| language = "code" | |
| } |
| # Local runner only supports Python | ||
| if runtime != "python": | ||
| raise ValueError( | ||
| f"LocalRunner only supports 'python' runtime, got: {runtime}" |
Copilot
AI
Dec 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing RestrictedPython eliminates sandboxing protections. The local runner now executes arbitrary Python code without restrictions. This is a significant security regression if untrusted code can be executed. Ensure that the local runner is only used in trusted development environments and that production deployments use the Daytona runner.
| agenta_credentials = ( | ||
| RunningContext.get().credentials | ||
| # | ||
| or "" |
Copilot
AI
Dec 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String slicing agenta_credentials[7:] assumes 'ApiKey ' prefix is exactly 7 characters. However, the check is for 'ApiKey ' (with space), which is also 7 characters, so this is correct. But if the prefix format changes (e.g., 'ApiKey ' with two spaces), this will fail silently. Consider using agenta_credentials.removeprefix('ApiKey ') for robustness.
| or "" | |
| agenta_credentials.removeprefix("ApiKey ") |
| // Insert spaces instead of tab character | ||
| // Use 4 spaces for Python/code (PEP 8 standard) | ||
| // Use 2 spaces for JSON/YAML (typical formatting) | ||
| const spaces = language === "json" || language === "yaml" ? " " : " " |
Copilot
AI
Dec 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting these magic numbers (2 spaces for JSON/YAML, 4 spaces for code/Python/JavaScript/TypeScript) into named constants at the module level. This would make the indentation standards more visible and easier to modify consistently across the codebase.
| for runtime, folder in BASIC_DIRS.items(): | ||
| if not folder.exists(): | ||
| continue | ||
| pattern = "*.py" if runtime == "python" else "*.js" if runtime == "javascript" else "*.ts" |
Copilot
AI
Dec 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This nested ternary expression is difficult to read. Consider using a dictionary mapping or if-elif-else structure for better clarity.
| pattern = "*.py" if runtime == "python" else "*.js" if runtime == "javascript" else "*.ts" | |
| if runtime == "python": | |
| pattern = "*.py" | |
| elif runtime == "javascript": | |
| pattern = "*.js" | |
| else: | |
| pattern = "*.ts" |
…k-daytona-code-evaluator
daytona code evaluatorsDaytonaRunner for code evaluators
…k-daytona-code-evaluator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 168 out of 310 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } from "@/oss/state/testsetSelection" | ||
|
|
||
| /** | ||
| * Testset Queries - Clean atom-based data fetching |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'recieve' to 'receive' in comment.
| } | ||
| }, [parsed.fullValue]) | ||
|
|
||
| const isPdf = mimeType === "application/pdf" |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable isPdf is declared but never used in the component. Consider removing it or using it in the conditional rendering logic if PDF-specific behavior is intended.
…k-daytona-code-evaluator
…k-daytona-code-evaluator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 48 out of 55 changed files in this pull request and generated 20 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """ | ||
| Execute provided Python code safely using RestrictedPython. | ||
| Execute provided Python code directly. |
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LocalRunner now executes code directly without any sandboxing or restrictions, but the docstring still references "safe execution". The comment on line 8 says "Local code runner using direct Python execution" which is accurate, but the run method docstring should be updated to reflect that this is NOT safe execution and should only be used in trusted environments.
| Execute the provided code safely. | ||
| Uses the configured runner (local RestrictedPython or remote Daytona) | ||
| Uses the configured runner (local or remote Daytona) |
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring for execute_code_safely still says the function executes code "safely", but with the LocalRunner now using direct exec() without restrictions, this is misleading. The function name and docstring should be updated to reflect that safety depends on the runner implementation, and LocalRunner is not actually safe.
| // NO transformation for Python/code - keep indent exactly as-is | ||
| // Just add the indent as a plain text node (preserves spaces AND tabs) | ||
| if (indent.length > 0) { | ||
| codeLine.append($createCodeHighlightNode(indent, "plain", false, null)) |
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 248 says "NO transformation for Python/code - keep indent exactly as-is" which is accurate, but then the comment on line 249 says "Just add the indent as a plain text node (preserves spaces AND tabs)". These could be combined into a single, clearer comment explaining that for Python/JS/TS, indentation is preserved exactly as pasted (both spaces and tabs) by inserting it as a plain text node.
| runtime = runtime or "python" | ||
|
|
||
| # Select general snapshot | ||
| snapshot_id = os.getenv("DAYTONA_SNAPSHOT") |
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The environment variable name has changed from AGENTA_SERVICES_SANDBOX_SNAPSHOT_PYTHON to DAYTONA_SNAPSHOT. This appears to be a breaking change that could affect existing deployments. Consider either maintaining backward compatibility by checking both variable names, or documenting this breaking change clearly in migration notes.
| if response_error: | ||
| log.error(f"Sandbox execution error: {response_error}") | ||
| raise RuntimeError(f"Sandbox execution failed: {response_error}") | ||
| if response_exit_code and response_exit_code != 0: |
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code checks if response_exit_code is truthy before checking if it's non-zero. However, if exit_code is 0 (success), the expression response_exit_code and response_exit_code != 0 would be False (correct). But if exit_code is None (when the attribute doesn't exist), this would also be False, potentially masking errors. Consider explicitly checking if response_exit_code is not None and response_exit_code != 0 to distinguish between "no exit code" and "exit code is 0".
| "code": "from typing import Dict, Union, Any\n\n\ndef evaluate(\n app_params: Dict[str, str], # deprecated; currently receives {}\n inputs: Dict[str, str],\n output: Union[str, Dict[str, Any]],\n correct_answer: str,\n) -> float:\n if output == correct_answer:\n return 1.0\n return 0.0\n", | ||
| }, | ||
| "description": "Exact match evaluator implemented in Python.", | ||
| }, | ||
| { | ||
| "key": "javascript_default", | ||
| "name": "Exact Match (JavaScript)", | ||
| "values": { | ||
| "requires_llm_api_keys": False, | ||
| "runtime": "javascript", | ||
| "correct_answer_key": "correct_answer", | ||
| "code": 'function evaluate(appParams, inputs, output, correctAnswer) {\n void appParams\n void inputs\n\n const outputStr =\n typeof output === "string" ? output : JSON.stringify(output)\n\n return outputStr === String(correctAnswer) ? 1.0 : 0.0\n}\n', | ||
| }, | ||
| "description": "Exact match evaluator implemented in JavaScript.", | ||
| }, | ||
| { | ||
| "key": "typescript_default", | ||
| "name": "Exact Match (TypeScript)", | ||
| "values": { | ||
| "requires_llm_api_keys": False, | ||
| "runtime": "typescript", | ||
| "correct_answer_key": "correct_answer", | ||
| "code": 'type OutputValue = string | Record<string, unknown>\n\nfunction evaluate(\n app_params: Record<string, string>,\n inputs: Record<string, string>,\n output: OutputValue,\n correct_answer: string\n): number {\n void app_params\n void inputs\n\n const outputStr =\n (typeof output === "string" ? output : JSON.stringify(output)) as string\n\n return outputStr === String(correct_answer) ? 1.0 : 0.0\n}\n', | ||
| }, |
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preset code values are stored as long single-line strings with embedded newlines (\n). This makes the code difficult to read and maintain in the resource file. Consider using multiline strings or loading these presets from separate files to improve readability and maintainability of the evaluator preset code.
| response = sandbox.process.code_run(wrapped_code) | ||
| response_stdout = response.result if hasattr(response, "result") else "" |
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The response handling uses response.result as the stdout content on line 250, but the production code comment history shows that previously it was response.stdout. The test script on line 119 also uses resp.result. However, there's no clear documentation about the Daytona API version being used. Consider documenting which Daytona SDK version this code is compatible with to avoid confusion about the correct attribute names.
| if not snapshot_id: | ||
| raise RuntimeError( | ||
| "AGENTA_SERVICES_SANDBOX_SNAPSHOT_PYTHON environment variable is required. " | ||
| "Set it to the Daytona sandbox ID or snapshot name you want to use." | ||
| f"No Daytona snapshot configured for runtime '{runtime}'. " | ||
| f"Set DAYTONA_SNAPSHOT environment variable." | ||
| ) |
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message references runtime variable but uses a generic message format. When DAYTONA_SNAPSHOT is not set, the error message says "No Daytona snapshot configured for runtime '{runtime}'", but the snapshot selection logic doesn't actually vary by runtime - it uses the same DAYTONA_SNAPSHOT for all runtimes. This could be misleading. Consider clarifying the error message to reflect that a single snapshot is used for all runtimes.
| agenta_api_key = ( | ||
| agenta_credentials[7:] | ||
| if agenta_credentials.startswith("ApiKey ") | ||
| else "" | ||
| ) |
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code extracts API key from credentials by checking if it starts with "ApiKey " and slicing from position 7, but if the credentials string is exactly "ApiKey " (with no actual key following), this would result in an empty string, which would still be added to env vars. Consider adding validation to ensure the extracted API key is non-empty.
| # Fallback: attempt to extract a JSON object containing "result" | ||
| for line in reversed(output_lines): | ||
| if "result" not in line: | ||
| continue | ||
| start = line.find("{") | ||
| end = line.rfind("}") | ||
| if start == -1 or end == -1 or end <= start: | ||
| continue | ||
| try: | ||
| result_obj = json.loads(line[start : end + 1]) |
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback result parsing logic has a potential issue. The code finds the last occurrence of '}' with rfind("}"), but this could match a closing brace that isn't part of the result JSON object. For example, if the output contains nested JSON or code snippets, this could incorrectly identify a brace position. Consider using a more robust JSON extraction approach or validating that the extracted substring is actually valid JSON before attempting to parse it.
No description provided.