feat: enforce strict code quality (v0.14.0)#469
Conversation
📝 WalkthroughWalkthroughThis PR implements a comprehensive code quality and type-safety initiative, bumping versions from 0.13.4 to 0.14.0 across manifests, adding mypy type-checking infrastructure with pre-commit and CI integration, expanding Ruff lint rules, modernizing timezone handling via UTC constants, improving exception handling patterns, and adding type annotations throughout the codebase. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
6bc1ca3 to
5b020d1
Compare
…king (v0.14.0) Add comprehensive static analysis enforcement across all 134 source files: - Expand Ruff rules: B, UP, SIM, RUF, C4, PIE, PT, ASYNC, S, T20 - Add mypy type checking with Pydantic plugin and CI type-check job - Fix 72 ruff lint violations across 36 files - Fix 267 mypy type errors across 39 files - All 4293 tests pass with zero regressions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5b020d1 to
b9c29c7
Compare
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
tests/test_benchmark_integration.py (1)
111-116:⚠️ Potential issue | 🟡 MinorAvoid broad
except Exceptionin thread aggregation (BLE001).This blanket catch triggers Ruff BLE001 and can mask programming errors. Catch specific expected exceptions (e.g., network/load failures) or re-raise unexpected ones after recording the error.
tests/test_eval_reliability.py (1)
148-154:⚠️ Potential issue | 🟡 MinorStale docstring/comment and near-duplicate test after
TimeoutErrorunification.Line 149's docstring still says "should catch asyncio.TimeoutError (not just TimeoutError)" and line 153's comment references "Python <3.11 compat", but the side effect now raises the built-in
TimeoutError. Additionally, this test is now functionally identical totest_builtin_timeout_error_still_caught(lines 170–185) — both raiseTimeoutError()and assert the same outcome.Consider either:
- Removing one of the two duplicate tests, or
- If you want to keep explicit coverage for both exception types, restoring
asyncio.TimeoutError()in this test (it's still valid in 3.11+ as an alias) and updating the comment to clarify the intent.Either way, the docstring and comment on lines 149/153 should be updated to reflect reality.
src/mcpbr/infrastructure/azure.py (1)
572-586:⚠️ Potential issue | 🟡 Minor
datetime.now()without UTC is inconsistent with the PR's timezone migration.Line 585 uses
datetime.now().isoformat()which produces a naive local timestamp, while the rest of this PR consistently migrates todatetime.now(UTC). This should be updated for consistency.Proposed fix
- from datetime import datetime + from datetime import UTC, datetime assert self.vm_name is not None assert self.vm_ip is not None run_state = RunState( vm_name=self.vm_name, vm_ip=self.vm_ip, resource_group=self.azure_config.resource_group, location=self.azure_config.location, ssh_key_path=str(self.ssh_key_path), config_path=str(self.config.config_path) if hasattr(self.config, "config_path") else "", - started_at=datetime.now().isoformat(), + started_at=datetime.now(UTC).isoformat(), )
🤖 Fix all issues with AI agents
In `@pyproject.toml`:
- Around line 137-138: Remove the redundant per-file-ignores entry
`"infrastructure/**/*.py" = ["S603", "S607"]` from pyproject.toml; instead keep
the existing `"tests/**/*.py"` and the `"src/mcpbr/infrastructure/**/*.py"`
entries and ensure the remaining JSON-like mapping remains valid after deleting
that line so the file parses correctly.
In `@src/mcpbr/env_expansion.py`:
- Around line 139-148: The exclusion `not key_lower.endswith("_key")` in the
conditional is too broad and skips sensitive names; update the check in
env_expansion.py around the `if (` that uses key_lower and value so that `_key`
names are treated safely — either add sensitive `_key` patterns (e.g.,
"secret_key", "private_key", "encryption_key", "signing_key") to the first
branch's keyword list checked with `any(keyword in key_lower ...)`, or replace
the blanket exclusion with an allowlist of known-benign `_key` suffixes (e.g.,
"cache_key", "primary_key") and keep all other `_key` endings subject to the
API-key heuristic, ensuring warnings.append still references path and value
length checks as before.
In `@src/mcpbr/log_formatter.py`:
- Line 509: The line that opens the log file contains an unnecessary noqa
suppression ("# noqa: SIM115")—remove that directive from the open call so the
line becomes a plain open(log_file_path, "w") call; locate the open(...)
invocation where the variable log_file is assigned and drop the trailing "#
noqa: SIM115" comment.
In `@src/mcpbr/profiler.py`:
- Around line 152-157: The broad "except Exception:" in the memory-sampling
block of the Profiler (around the except handling that sets
self.enable_memory_profiling) triggers Ruff BLE001; replace the broad catch with
specific exceptions (e.g., except (psutil.Error, OSError):) or, if the broad
catch is intentional for best-effort telemetry, annotate it with the linter
exemption comment "# noqa: BLE001" so Ruff stops flagging it; update the except
block around the memory profiling sampling in profiler.py (the handler that
currently has "# noqa: S110 -- best-effort memory sampling; non-critical
telemetry") accordingly.
In `@src/mcpbr/reproducibility.py`:
- Around line 114-119: The loop in get package collection uses unstable metadata
indexing (dist.metadata["Name"]/["Version"]) which can KeyError; change it to
use the stable Distribution properties (dist.name and dist.version) when
building the packages dict in the function that iterates over distributions(),
e.g., replace references to dist.metadata["Name"] and dist.metadata["Version"]
with dist.name and dist.version and optionally guard with or "" to preserve
current behavior; keep the surrounding exception handling as a last-resort
fallback.
In `@tests/test_benchmark_integration.py`:
- Around line 65-66: Replace the broad "except Exception as e" that writes to
result["error"] with a targeted handler that only catches expected errors from
benchmark creation/loading (e.g., ValueError, IOError, OSError or your
domain-specific exceptions) and records them into result["error"]; for anything
else, re-raise the exception so tests fail and provide a traceback. Concretely:
change the "except Exception as e" block to either "except (ValueError, IOError)
as e:" (adding any project-specific exception classes) or use "except Exception
as e: if isinstance(e, (ValueError, IOError, YourBenchmarkError)):
result['error'] = ... else: raise" to avoid swallowing unexpected failures while
preserving the existing error message formatting.
In `@tests/test_harness_notifications.py`:
- Around line 42-48: The test currently assigns a hardcoded literal directly to
config.slack_bot_token (triggering Ruff S105); change it to assign the token to
a non-literal variable (e.g., token_var = "xoxb-token"), set
config.slack_bot_token = token_var, and update the assertion to compare
result["slack_bot_token"] against that variable instead of a literal, leaving
the rest of the test (discord/slack_webhook assertions and use of
_build_notify_config) unchanged.
In `@tests/test_thinking_budget.py`:
- Around line 283-288: Replace the hard-coded "/tmp/test" and the broad
contextlib.suppress(Exception) in the patched test block with a pytest tmp_path
fixture and a narrow suppression (or none); specifically update the
patch/context manager that surrounds the call to harness.solve(task,
"/tmp/test", timeout=1) to call harness.solve(task, tmp_path / "test",
timeout=1) and change contextlib.suppress(Exception) to
contextlib.suppress(TimeoutError) (or remove the suppressor entirely), and apply
the same change to the other async test block that patches
mcpbr.harnesses._run_cli_command and shutil.which so both tests use tmp_path and
only suppress TimeoutError.
🧹 Nitpick comments (16)
src/mcpbr/analytics/regression_detector.py (1)
348-353: Nit: prefer truthy check overlen(...) > 0.Since this PR enables the SIM rule set,
len(task_regressions) > 0can be simplified to justtask_regressions. An empty list is falsy in Python, so the semantics are identical.♻️ Suggested simplification
elif ( cost_regression_detected or latency_regression_detected or token_regression_detected - or len(task_regressions) > 0 + or task_regressions ):src/mcpbr/analytics/correlation.py (1)
50-50: Preferstrict=Truehere — lengths are already validated.Since lines 39–42 guarantee
len(x) == len(y), usingstrict=Trueprovides defense-in-depth at no cost, and better communicates the intent that these sequences must be the same length.strict=False(the default) silences B905 but opts out of the safety net.Suggested change
- numerator = math.fsum((xi - x_mean) * (yi - y_mean) for xi, yi in zip(x, y, strict=False)) + numerator = math.fsum((xi - x_mean) * (yi - y_mean) for xi, yi in zip(x, y, strict=True))tests/test_audit.py (1)
985-985: Considerstrict=Truehere instead ofstrict=False.Line 983 already asserts both lists have length 2, so they're guaranteed equal at this point. Using
strict=Truewould be the more defensive choice — it catches length mismatches at thezipcall itself rather than silently truncating, which is especially valuable if the assertion on line 983 is ever relaxed or removed.Suggested change
- for json_entry, csv_row in zip(json_data, csv_rows, strict=False): + for json_entry, csv_row in zip(json_data, csv_rows, strict=True):src/mcpbr/graceful_degradation.py (1)
111-120: Redundant entries in_TRANSIENT_ERRORS.
asyncio.TimeoutErroris an alias forTimeoutErrorsince Python 3.11,IOErroris an alias forOSErrorsince Python 3.3, andConnectionResetError/ConnectionRefusedError/ConnectionAbortedErrorare subclasses ofConnectionError. These overlap but are harmless — just something to clean up if you want a tighter tuple.tests/test_graceful_degradation.py (1)
109-113: This test is now a duplicate oftest_timeout_is_transient.After replacing
asyncio.TimeoutError()withTimeoutError(), this test does exactly the same thing astest_timeout_is_transient(line 67). Since they're aliases in Python 3.11+, both tests instantiateTimeoutErrorand assertTRANSIENT. Consider removing this test or, if you want to keep explicit coverage for the alias, at least updating the name/docstring to reflect that it's intentionally testing the alias equivalence.tests/test_docker_retry.py (1)
22-24: Ruff ARG001 false positive —mock_docker_clientis a pytest fixture dependency.The static analysis hint flags
mock_docker_clientas unused, but it's required here as an implicit fixture dependency: it ensuresdocker.from_envis patched beforeDockerEnvironmentManager()is instantiated on line 24. This is idiomatic pytest and safe to ignore. If Ruff'sARG001rule is enforced project-wide, you could suppress it with a leading underscore or a# noqa: ARG001comment, but that's entirely optional.src/mcpbr/notifications.py (1)
493-494: Type annotation looks good.The explicit
urlvariable withstr | Nonetype annotation correctly reflects the API response structure and improves type safety.Optional: Consider try/except/else structure (Ruff TRY300).
The static analyzer suggests moving the return statement to an
elseblock for more explicit control flow. However, the current pattern is perfectly readable and commonly used. This is purely a style preference.♻️ Optional refactor to satisfy TRY300
try: response = requests.post( "https://api.github.com/gists", headers={ "Authorization": f"token {github_token}", "Accept": "application/vnd.github.v3+json", }, json={ "description": description, "public": False, "files": { "results.json": {"content": results_json}, }, }, timeout=30, ) response.raise_for_status() - url: str | None = response.json().get("html_url") - return url except Exception as e: logger.warning("Failed to create GitHub Gist: %s", e) return None + else: + url: str | None = response.json().get("html_url") + return urlsrc/mcpbr/sampling.py (1)
136-140: Optional: Simplify defaultdict usage.Since
groupsis adefaultdict(list), line 139 is redundant. The defaultdict automatically creates an empty list for missing keys, so you can remove the explicit assignment.♻️ Proposed simplification
groups: dict[str, list[dict[str, Any]]] = defaultdict(list) for task in tasks: key = str(task.get(stratify_field, "_unknown_")) - groups[key] = groups.get(key, []) groups[key].append(task).pre-commit-config.yaml (1)
19-26: Consider scoping the hook trigger tosrc/mcpbr/files only.The hook runs mypy on
src/mcpbr/but triggers on any Python file change (including tests) due totypes: [python]without afilesfilter. Adding afilespattern would avoid unnecessary mypy runs when only test files change.- id: mypy name: mypy entry: uv run --extra dev mypy src/mcpbr/ language: system pass_filenames: false types: [python] + files: ^src/mcpbr/src/mcpbr/smoke_test.py (1)
164-165: Acceptabletype: ignoreforasyncio.to_threadcallable typing.The
# type: ignore[arg-type]is a reasonable suppression here sinceasyncio.to_threadhas strict callable typing that doesn't align well with SDK method signatures passed with keyword arguments. Consider adding a brief explanation in the comment (e.g.,# type: ignore[arg-type] # asyncio.to_thread callable signature mismatch) for future readers.src/mcpbr/benchmarks/mcptoolbench.py (1)
397-399:enumerateis unnecessary —_iis never used.The index variable
_ifromenumerateis unused. Theenumeratewrapper can be dropped, andstrict=Falseis already the default forzip.♻️ Simplify the loop
- for _i, (agent_call, gt_call) in enumerate( - zip(agent_calls, ground_truth, strict=False) - ): + for agent_call, gt_call in zip(agent_calls, ground_truth):CHANGELOG.md (1)
10-32: Changelog entry looks good and accurately documents the v0.14.0 changes.Minor note: The
[0.13.0]section (line 34) doesn't have a corresponding link reference at the bottom of the file. This appears to be pre-existing but worth fixing while you're editing the file.src/mcpbr/config_validator.py (1)
492-492:dict(error)conversion is redundant but harmless.Pydantic's
e.errors()already returnslist[dict[str, Any]], soerroris already a dict. Thedict()call is a no-op here but may satisfy mypy if the inferred type was more narrow. Not worth changing, just noting.src/mcpbr/infrastructure/aws.py (1)
417-417: Replaceassertstatements with explicit runtime error checks for provider state validation.All six preconditions use
assert, which are stripped withpython -O, allowingNoneto flow into AWS CLI and paramiko calls. UseRuntimeErroror similar explicit error handling instead for robustness.🛠️ Example for one guard
- assert self.instance_id is not None + if self.instance_id is None: + raise RuntimeError("Instance ID not set; call _create_instance() first")Affected lines: 417, 476, 607, 771, 825, 884–885
src/mcpbr/infrastructure/gcp.py (2)
688-702: Naivedatetime.now()— consider using UTC for consistency with the broader PR.The AI summary notes this PR normalizes timezone handling to UTC across multiple modules. Line 701 uses
datetime.now()which produces a naive (timezone-unaware) timestamp, inconsistent with that pattern.Suggested fix
- from datetime import datetime + from datetime import UTC, datetime ... - started_at=datetime.now().isoformat(), + started_at=datetime.now(UTC).isoformat(),
364-364: Assertions for mypy narrowing are acceptable but note they're no-ops underpython -O.The assertions on
self.instance_ip,self.ssh_client, etc. help mypy but are skipped if Python runs with optimizations enabled (-O/-OO). Forssh_client,_ssh_exec(line 399) already has a proper runtime guard; the directself.ssh_client.exec_commandcalls on lines 749 and 802-803 bypass that guard. If this code could ever run under-O, those paths would hit anAttributeErroronNone.Given the codebase context (CLI tool, unlikely to run optimized), this is low risk.
Also applies to: 498-498, 748-748, 802-802, 861-861
| "infrastructure/**/*.py" = ["S603", "S607"] | ||
| "src/mcpbr/infrastructure/**/*.py" = ["S603", "S607", "S108"] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if there's a top-level infrastructure/ directory
fd -t d '^infrastructure$' --max-depth 2
# Also list the directory structure to understand the layout
echo "=== Root level directories ==="
ls -la | grep '^d' | head -20
# Examine the relevant section of pyproject.toml
echo "=== Ruff per-file-ignores section ==="
sed -n '135,145p' pyproject.tomlRepository: greynewell/mcpbr
Length of output: 1729
🏁 Script executed:
#!/bin/bash
# Check for Python files in infrastructure directories
echo "=== Python files in docs/infrastructure ==="
find docs/infrastructure -name "*.py" -type f 2>/dev/null | head -20
echo "=== Python files in tests/infrastructure ==="
find tests/infrastructure -name "*.py" -type f 2>/dev/null | head -20
echo "=== All files in docs/infrastructure ==="
find docs/infrastructure -type f 2>/dev/null
echo "=== All files in tests/infrastructure ==="
find tests/infrastructure -type f 2>/dev/nullRepository: greynewell/mcpbr
Length of output: 1226
Remove redundant per-file-ignores pattern on Line 137.
"infrastructure/**/*.py" on line 137 matches files in tests/infrastructure/ (13 test files found), but these are already covered by line 136's "tests/**/*.py" pattern, which takes precedence in ruff's rule matching. This makes line 137 unreachable for its only Python files. Remove it to clarify the configuration.
Proposed fix
[tool.ruff.lint.per-file-ignores]
"tests/**/*.py" = ["S", "T20"]
-"infrastructure/**/*.py" = ["S603", "S607"]
"src/mcpbr/infrastructure/**/*.py" = ["S603", "S607", "S108"]
"scripts/**/*.py" = ["T20", "S"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "infrastructure/**/*.py" = ["S603", "S607"] | |
| "src/mcpbr/infrastructure/**/*.py" = ["S603", "S607", "S108"] | |
| [tool.ruff.lint.per-file-ignores] | |
| "tests/**/*.py" = ["S", "T20"] | |
| "src/mcpbr/infrastructure/**/*.py" = ["S603", "S607", "S108"] | |
| "scripts/**/*.py" = ["T20", "S"] |
🤖 Prompt for AI Agents
In `@pyproject.toml` around lines 137 - 138, Remove the redundant per-file-ignores
entry `"infrastructure/**/*.py" = ["S603", "S607"]` from pyproject.toml; instead
keep the existing `"tests/**/*.py"` and the `"src/mcpbr/infrastructure/**/*.py"`
entries and ensure the remaining JSON-like mapping remains valid after deleting
that line so the file parses correctly.
| # Check for API keys (specific patterns with lower threshold, | ||
| # generic "key" suffix with higher threshold to avoid false positives) | ||
| if ( | ||
| any(keyword in key_lower for keyword in ["api_key", "api-key", "apikey"]) | ||
| and len(value) > 5 | ||
| ) or (key_lower.endswith("key") and not key_lower.endswith("_key") and len(value) > 10): | ||
| warnings.append( | ||
| f"Possible API key hardcoded at '{path}'. " | ||
| f"Consider using environment variables: ${{API_KEY}}" | ||
| ) |
There was a problem hiding this comment.
The not key_lower.endswith("_key") exclusion is overly broad and misses genuinely sensitive keys.
The second branch excludes every key ending in _key to avoid false positives on benign names like "cache_key" or "primary_key". However, this also silently skips "secret_key", "private_key", "encryption_key", and "signing_key" — all of which are common names for hardcoded secrets.
Consider using an allowlist of known-benign _key suffixes instead of blanket exclusion, or add the sensitive _key patterns to the first branch's keyword list.
Suggested approach: expand the keyword list in the first branch
if (
- any(keyword in key_lower for keyword in ["api_key", "api-key", "apikey"])
+ any(keyword in key_lower for keyword in [
+ "api_key", "api-key", "apikey",
+ "secret_key", "private_key", "encryption_key", "signing_key",
+ ])
and len(value) > 5
) or (key_lower.endswith("key") and not key_lower.endswith("_key") and len(value) > 10):🤖 Prompt for AI Agents
In `@src/mcpbr/env_expansion.py` around lines 139 - 148, The exclusion `not
key_lower.endswith("_key")` in the conditional is too broad and skips sensitive
names; update the check in env_expansion.py around the `if (` that uses
key_lower and value so that `_key` names are treated safely — either add
sensitive `_key` patterns (e.g., "secret_key", "private_key", "encryption_key",
"signing_key") to the first branch's keyword list checked with `any(keyword in
key_lower ...)`, or replace the blanket exclusion with an allowlist of
known-benign `_key` suffixes (e.g., "cache_key", "primary_key") and keep all
other `_key` endings subject to the API-key heuristic, ensuring warnings.append
still references path and value length checks as before.
| if log_file_path: | ||
| log_file_path.parent.mkdir(parents=True, exist_ok=True) | ||
| log_file = open(log_file_path, "w") | ||
| log_file = open(log_file_path, "w") # noqa: SIM115 - caller closes the file handle |
There was a problem hiding this comment.
Remove unused noqa directive.
Ruff reports SIM115 is not enabled, so this suppression is unnecessary.
Fix
- log_file = open(log_file_path, "w") # noqa: SIM115 - caller closes the file handle
+ log_file = open(log_file_path, "w") # caller closes the file handle📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| log_file = open(log_file_path, "w") # noqa: SIM115 - caller closes the file handle | |
| log_file = open(log_file_path, "w") # caller closes the file handle |
🧰 Tools
🪛 Ruff (0.15.0)
[warning] 509-509: Unused noqa directive (non-enabled: SIM115)
Remove unused noqa directive
(RUF100)
🤖 Prompt for AI Agents
In `@src/mcpbr/log_formatter.py` at line 509, The line that opens the log file
contains an unnecessary noqa suppression ("# noqa: SIM115")—remove that
directive from the open call so the line becomes a plain open(log_file_path,
"w") call; locate the open(...) invocation where the variable log_file is
assigned and drop the trailing "# noqa: SIM115" comment.
| except ImportError: | ||
| # psutil not available, disable memory profiling | ||
| self.enable_memory_profiling = False | ||
| except Exception: | ||
| except Exception: # noqa: S110 -- best-effort memory sampling; non-critical telemetry | ||
| # Failed to sample memory, skip silently | ||
| pass |
There was a problem hiding this comment.
Ruff BLE001 still triggered by the broad Exception catch.
# noqa: S110 doesn’t silence BLE001. Either narrow the exception (e.g., psutil.Error, OSError) or add # noqa: BLE001 if the broad catch is intentional to keep lint green.
🛠️ Suggested narrow catch
- except Exception: # noqa: S110 -- best-effort memory sampling; non-critical telemetry
+ except (psutil.Error, OSError): # noqa: S110 -- best-effort memory sampling; non-critical telemetry
# Failed to sample memory, skip silently
passAs per coding guidelines: "Run linting checks with Ruff and ensure all issues are resolved before committing".
🧰 Tools
🪛 Ruff (0.15.0)
[warning] 155-155: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@src/mcpbr/profiler.py` around lines 152 - 157, The broad "except Exception:"
in the memory-sampling block of the Profiler (around the except handling that
sets self.enable_memory_profiling) triggers Ruff BLE001; replace the broad catch
with specific exceptions (e.g., except (psutil.Error, OSError):) or, if the
broad catch is intentional for best-effort telemetry, annotate it with the
linter exemption comment "# noqa: BLE001" so Ruff stops flagging it; update the
except block around the memory profiling sampling in profiler.py (the handler
that currently has "# noqa: S110 -- best-effort memory sampling; non-critical
telemetry") accordingly.
| for dist in distributions(): | ||
| name = dist.metadata.get("Name", "") | ||
| version = dist.metadata.get("Version", "") | ||
| name = dist.metadata["Name"] or "" | ||
| version = dist.metadata["Version"] or "" | ||
| if name: | ||
| packages[name] = version | ||
| except Exception: | ||
| except Exception: # noqa: S110 -- best-effort package collection; environment may lack importlib.metadata |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
importlib.metadata Distribution metadata "Name" "Version" keys guaranteed
💡 Result:
Yes—for a standards-compliant distribution metadata file, Name and Version are required core-metadata fields (along with Metadata-Version). The current PyPA Core Metadata spec (v2.5) explicitly lists Metadata-Version, Name, and Version as the required fields. [1]
However, importlib.metadata does not give you an absolute “guarantee” for arbitrary installed environments (metadata can be missing/corrupt). The stdlib docs describe importlib.metadata as operating on installed distributions whose metadata is defined by the Core Metadata specs, and its PackageMetadata behaves like a mapping where obj['Key'] is equivalent to obj.get('Key') (so a missing key can yield None rather than being guaranteed present). [2]
Practical takeaways:
- If you need the version reliably, prefer
importlib.metadata.version(dist_name)/Distribution.version(it’s the dedicated API). [2] - If you read
dist.metadata, treatdist.metadata.get("Name")/.get("Version")as “should exist per spec”, but still defensively handle missing values. [1][2]
Sources: [1] PyPA Core Metadata specification (required fields) • [2] Python importlib.metadata docs (PackageMetadata behavior and scope)
🌐 Web query:
importlib.metadata.Distribution.metadata safe access pattern KeyError
💡 Result:
Distribution.metadata is a PackageMetadata (dict-like) object. To avoid KeyError across versions/backports, don’t rely on md['Key']—use get() / get_all() (and defaults) instead. In stdlib docs, subscripting is described as equivalent to get(), but the backport (importlib-metadata) has explicitly warned that missing-key behavior is changing toward raising KeyError. [1], [2]
from importlib.metadata import distribution, PackageNotFoundError
try:
md = distribution("some-dist").metadata
except PackageNotFoundError:
md = None
if md is not None:
name = md.get("Name") # -> str | None
license = md.get("License", "") # -> default if missing
# fields that can appear multiple times (e.g., Classifier, Project-URL)
classifiers = md.get_all("Classifier") or [] # -> list[str]If you truly want strict behavior, do:
value = md.get("License")
if value is None:
... # handle missingSources: Python importlib.metadata docs for PackageMetadata/subscripting. [1] Backport (importlib-metadata) changelog noting missing-key behavior moving toward KeyError. [2]
[1] (docs.python.org)
[2] (importlib-metadata.readthedocs.io)
Citations:
- 1: https://docs.python.org/3.15/library/importlib.metadata.html
- 2: https://importlib-metadata.readthedocs.io/en/stable/history.html?utm_source=openai
🌐 Web query:
Python importlib.metadata how to safely access distribution metadata
💡 Result:
To safely access installed distribution metadata with importlib.metadata, use the public API (version, metadata, distribution, requires, files) and defensively handle (a) “package not installed” and (b) “field not present”.
1) Handle “not installed” (PackageNotFoundError)
Most query functions raise importlib.metadata.PackageNotFoundError if the named distribution isn’t installed. [1]
from importlib import metadata
try:
v = metadata.version("wheel")
except metadata.PackageNotFoundError:
v = None2) Prefer metadata() / distribution().metadata and use dict-like access
metadata(name) returns a PackageMetadata object; treat it as a mapping and use .get() (or in) for optional fields. [2]
from importlib import metadata
try:
m = metadata.metadata("wheel") # PackageMetadata
except metadata.PackageNotFoundError:
m = None
summary = m.get("Summary") if m else None
license_ = m.get("License") if m else None
requires_python = m.get("Requires-Python") if m else NoneIf you need more than metadata (e.g., .version, .origin), use distribution():
from importlib import metadata
try:
dist = metadata.distribution("wheel")
except metadata.PackageNotFoundError:
dist = None
version = dist.version if dist else None
requires_python = dist.metadata.get("Requires-Python") if dist else None3) Don’t rely on the underlying message type / parsing quirks
The returned metadata object may be implemented using email.message types, but that’s not a stable API—stick to dictionary-like access methods/syntax. [3]
4) Treat files() as optional (None)
files(name) can return None if the distribution is found but file records are missing. [2]
from importlib import metadata
try:
files = metadata.files("some-dist") # may be None
except metadata.PackageNotFoundError:
files = None5) If you want normalized JSON-compatible keys, use .json
PackageMetadata.json provides PEP 566-style JSON-compatible metadata (helpful for normalized key names like requires_python). [2]
from importlib import metadata
m = metadata.metadata("wheel")
requires_python = m.json.get("requires_python")Sources: Python importlib.metadata documentation (APIs, PackageNotFoundError, PackageMetadata, .json, files() returning None). [1][2][3]
Prefer dist.name and dist.version properties over metadata indexing.
Direct metadata indexing (dist.metadata["Name"] / ["Version"]) can raise KeyError across Python versions and backports. While the outer except Exception catches this, it masks the error and returns an empty packages dict. Instead, use the stable Distribution.name and Distribution.version properties, or defensively access metadata via .get():
💡 Suggested approaches
Option 1 (Preferred—uses stable properties):
- name = dist.metadata["Name"] or ""
- version = dist.metadata["Version"] or ""
+ name = dist.name or ""
+ version = dist.version or ""Option 2 (If metadata access is required):
- name = dist.metadata["Name"] or ""
- version = dist.metadata["Version"] or ""
+ name = dist.metadata.get("Name", "") or ""
+ version = dist.metadata.get("Version", "") or ""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for dist in distributions(): | |
| name = dist.metadata.get("Name", "") | |
| version = dist.metadata.get("Version", "") | |
| name = dist.metadata["Name"] or "" | |
| version = dist.metadata["Version"] or "" | |
| if name: | |
| packages[name] = version | |
| except Exception: | |
| except Exception: # noqa: S110 -- best-effort package collection; environment may lack importlib.metadata | |
| for dist in distributions(): | |
| name = dist.name or "" | |
| version = dist.version or "" | |
| if name: | |
| packages[name] = version | |
| except Exception: # noqa: S110 -- best-effort package collection; environment may lack importlib.metadata |
🧰 Tools
🪛 Ruff (0.15.0)
[warning] 119-119: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@src/mcpbr/reproducibility.py` around lines 114 - 119, The loop in get package
collection uses unstable metadata indexing (dist.metadata["Name"]/["Version"])
which can KeyError; change it to use the stable Distribution properties
(dist.name and dist.version) when building the packages dict in the function
that iterates over distributions(), e.g., replace references to
dist.metadata["Name"] and dist.metadata["Version"] with dist.name and
dist.version and optionally guard with or "" to preserve current behavior; keep
the surrounding exception handling as a last-resort fallback.
| except Exception as e: | ||
| result["error"] = f"{type(e).__name__}: {str(e)[:300]}" |
There was a problem hiding this comment.
Avoid broad except Exception (BLE001).
Catching all exceptions here will trip Ruff’s BLE001 and can hide unexpected failures. Prefer catching expected exception types from benchmark creation/loading, or let unexpected exceptions bubble to fail the test with a stack trace.
🧰 Tools
🪛 Ruff (0.15.0)
[warning] 65-65: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@tests/test_benchmark_integration.py` around lines 65 - 66, Replace the broad
"except Exception as e" that writes to result["error"] with a targeted handler
that only catches expected errors from benchmark creation/loading (e.g.,
ValueError, IOError, OSError or your domain-specific exceptions) and records
them into result["error"]; for anything else, re-raise the exception so tests
fail and provide a traceback. Concretely: change the "except Exception as e"
block to either "except (ValueError, IOError) as e:" (adding any
project-specific exception classes) or use "except Exception as e: if
isinstance(e, (ValueError, IOError, YourBenchmarkError)): result['error'] = ...
else: raise" to avoid swallowing unexpected failures while preserving the
existing error message formatting.
| config.slack_bot_token = "xoxb-token" | ||
| config.slack_channel = "#evals" | ||
| config.github_token = None | ||
| result = _build_notify_config(config) | ||
| assert result["slack_webhook"] == "https://hooks.slack.com/test" | ||
| assert result["discord_webhook"] == "https://discord.com/api/webhooks/test" | ||
| assert result["slack_bot_token"] == "xoxb-token" # noqa: S105 | ||
| assert result["slack_bot_token"] == "xoxb-token" |
There was a problem hiding this comment.
Fix Ruff S105 by avoiding hardcoded literal assignment to slack_bot_token.
Removing noqa reintroduces S105 on Lines 42 and 48. Consider assigning from a non-literal variable and asserting against that variable to keep the test intent while satisfying lint.
✅ Proposed change
- config.slack_bot_token = "xoxb-token"
+ token_value = "xoxb-token"
+ config.slack_bot_token = token_value
@@
- assert result["slack_bot_token"] == "xoxb-token"
+ assert result["slack_bot_token"] == token_value📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| config.slack_bot_token = "xoxb-token" | |
| config.slack_channel = "#evals" | |
| config.github_token = None | |
| result = _build_notify_config(config) | |
| assert result["slack_webhook"] == "https://hooks.slack.com/test" | |
| assert result["discord_webhook"] == "https://discord.com/api/webhooks/test" | |
| assert result["slack_bot_token"] == "xoxb-token" # noqa: S105 | |
| assert result["slack_bot_token"] == "xoxb-token" | |
| token_value = "xoxb-token" | |
| config.slack_bot_token = token_value | |
| config.slack_channel = "#evals" | |
| config.github_token = None | |
| result = _build_notify_config(config) | |
| assert result["slack_webhook"] == "https://hooks.slack.com/test" | |
| assert result["discord_webhook"] == "https://discord.com/api/webhooks/test" | |
| assert result["slack_bot_token"] == token_value |
🧰 Tools
🪛 Ruff (0.15.0)
[error] 42-42: Possible hardcoded password assigned to: "slack_bot_token"
(S105)
[error] 48-48: Possible hardcoded password assigned to: "slack_bot_token"
(S105)
🤖 Prompt for AI Agents
In `@tests/test_harness_notifications.py` around lines 42 - 48, The test currently
assigns a hardcoded literal directly to config.slack_bot_token (triggering Ruff
S105); change it to assign the token to a non-literal variable (e.g., token_var
= "xoxb-token"), set config.slack_bot_token = token_var, and update the
assertion to compare result["slack_bot_token"] against that variable instead of
a literal, leaving the rest of the test (discord/slack_webhook assertions and
use of _build_notify_config) unchanged.
| with ( | ||
| patch("mcpbr.harnesses._run_cli_command", side_effect=mock_run_cli), | ||
| patch("mcpbr.harnesses.shutil.which", return_value="/usr/bin/claude"), | ||
| contextlib.suppress(Exception), | ||
| ): | ||
| await harness.solve(task, "/tmp/test", timeout=1) |
There was a problem hiding this comment.
Avoid hard-coded /tmp paths and broad exception suppression in these tests.
Using /tmp/test trips Ruff S108 and is less portable. Also, suppressing Exception can hide unexpected failures; narrow it to TimeoutError (or remove it) while using tmp_path.
Suggested update (apply to both async tests)
@@
- `@pytest.mark.asyncio`
- async def test_local_mode_sets_max_thinking_tokens(self):
+ `@pytest.mark.asyncio`
+ async def test_local_mode_sets_max_thinking_tokens(self, tmp_path):
@@
- with (
- patch("mcpbr.harnesses._run_cli_command", side_effect=mock_run_cli),
- patch("mcpbr.harnesses.shutil.which", return_value="/usr/bin/claude"),
- contextlib.suppress(Exception),
- ):
- await harness.solve(task, "/tmp/test", timeout=1)
+ with (
+ patch("mcpbr.harnesses._run_cli_command", side_effect=mock_run_cli),
+ patch("mcpbr.harnesses.shutil.which", return_value="/usr/bin/claude"),
+ contextlib.suppress(TimeoutError),
+ ):
+ await harness.solve(task, str(tmp_path), timeout=1)
@@
- `@pytest.mark.asyncio`
- async def test_local_mode_no_env_when_thinking_budget_none(self):
+ `@pytest.mark.asyncio`
+ async def test_local_mode_no_env_when_thinking_budget_none(self, tmp_path):
@@
- with (
- patch("mcpbr.harnesses._run_cli_command", side_effect=mock_run_cli),
- patch("mcpbr.harnesses.shutil.which", return_value="/usr/bin/claude"),
- contextlib.suppress(Exception),
- ):
- await harness.solve(task, "/tmp/test", timeout=1)
+ with (
+ patch("mcpbr.harnesses._run_cli_command", side_effect=mock_run_cli),
+ patch("mcpbr.harnesses.shutil.which", return_value="/usr/bin/claude"),
+ contextlib.suppress(TimeoutError),
+ ):
+ await harness.solve(task, str(tmp_path), timeout=1)As per coding guidelines, "Run linting checks with Ruff and ensure all issues are resolved before committing" and "tests/**/*.py: Place tests in tests/ directory, mirror source file structure (e.g., src/mcpbr/foo.py → tests/test_foo.py), use pytest fixtures for common setup, and mark integration tests with @pytest.mark.integration".
Also applies to: 316-321
🧰 Tools
🪛 Ruff (0.15.0)
[error] 288-288: Probable insecure usage of temporary file or directory: "/tmp/test"
(S108)
🤖 Prompt for AI Agents
In `@tests/test_thinking_budget.py` around lines 283 - 288, Replace the hard-coded
"/tmp/test" and the broad contextlib.suppress(Exception) in the patched test
block with a pytest tmp_path fixture and a narrow suppression (or none);
specifically update the patch/context manager that surrounds the call to
harness.solve(task, "/tmp/test", timeout=1) to call harness.solve(task, tmp_path
/ "test", timeout=1) and change contextlib.suppress(Exception) to
contextlib.suppress(TimeoutError) (or remove the suppressor entirely), and apply
the same change to the other async test block that patches
mcpbr.harnesses._run_cli_command and shutil.which so both tests use tmp_path and
only suppress TimeoutError.
Summary
uv run mypy) and CI type-check jobTest plan
uvx ruff check src/ tests/— All checks passeduvx ruff format --check src/ tests/— 266 files already formatteduv run mypy src/mcpbr/— Success: no issues found in 134 source filesuv run pytest -m "not integration"— 4293 passed, 0 failures🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes: Version 0.14.0
New Features
Bug Fixes
Documentation