Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9b4f45c79d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
any thoughts? @lorenss-m |
hud/datasets/runner.py
Outdated
| task_list = [t if isinstance(t, Task) else Task.from_v4(t) for t in tasks] | ||
|
|
||
| if not task_list: | ||
| raise ValueError("No tasks to run") |
There was a problem hiding this comment.
Duplicated task normalization logic in runner functions
Medium Severity
The task normalization logic in run_dataset_async (lines 159-175) is nearly identical to run_dataset (lines 76-94). Both functions normalize agent_type from string to AgentType enum and normalize tasks from various input types to list[Task]. This ~17 lines of duplicated code should be extracted into a shared helper function like _normalize_tasks().
579b4f0 to
0191b4d
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
|
|
||
| def _load_raw_tasks(source: str) -> tuple[list[dict[str, Any]], list[str]]: | ||
| path = Path(source) | ||
| if path.exists() and path.suffix.lower() in {".json", ".jsonl"}: |
There was a problem hiding this comment.
Case sensitivity mismatch between validation and loading
Low Severity
The _load_raw_tasks and _load_raw_from_file functions use case-insensitive extension matching via .suffix.lower(), while the existing load_tasks function in loader.py uses case-sensitive matching. This means a file like tasks.JSONL would pass validation but fail when actually loaded via load_tasks, because loader.py wouldn't recognize the uppercase extension and would incorrectly try to fetch it as a HuggingFace dataset.
Additional Locations (1)
| module = importlib.util.module_from_spec(spec) # type: ignore[arg-type] | ||
| assert spec and spec.loader | ||
| spec.loader.exec_module(module) | ||
| return module.validate_command |
There was a problem hiding this comment.
Test uses unnecessarily complex importlib module loading
Medium Severity
The _load_validate_command() function uses importlib.util.spec_from_file_location to manually load the module when a simple import would work: from hud.cli.validate import validate_command. This pattern is inconsistent with other tests in hud/cli/tests/ which use standard imports.


Summary
hud validateto check task files or HF datasets without running themNote
Low Risk
Adds a new CLI command and validation-only code paths; main risk is false positives/negatives in task parsing/validation rather than runtime behavior changes.
Overview
Adds a new
hud validatecommand that loads tasks from a local.json/.jsonlfile or a dataset slug and validates each entry without running an eval.Validation now checks v4-style tasks via
validate_v4_task(when detected) and always attempts PydanticTaskconstruction, aggregating and printing per-task errors before exiting non-zero on failure. Includes unit tests covering valid tasks, missing required fields, and non-dict entries in the tasks list.Written by Cursor Bugbot for commit 0191b4d. This will update automatically on new commits. Configure here.