You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{"id": "cond-001", "name": "bug_fix_includes_validation", "prompt": "You are an AI coding agent given this prompt template output:\n\n---\nYou are working on Linear issue STA-142: Fix null pointer in frame lookup\n\n## Description\n\nWhen a frame is deleted and then looked up by ID, the sqlite adapter throws an unhandled null reference instead of returning undefined. Stack trace attached.\n\nLabels: bug\nPriority: High\n\n## Instructions\n\n1. Read the issue description carefully\n2. Implement the requested changes\n3. Write or update tests as needed\n4. Run lint and tests to verify\n5. Commit your changes with a descriptive message\n\nWork in the current directory. All changes will be on a dedicated branch.\n---\n\nDoes this prompt adequately guide an agent to fix a bug? Evaluate whether it includes: lint/test commands, commit format guidance, error handling expectations, and clear bug reproduction context. List what is present and what is missing.", "expected": {"mentions_lint_test": "prompt should include or guide agent to run lint and test commands", "commit_format": "prompt should specify commit message format like type(scope): message", "bug_context": "prompt should guide the agent to understand the bug before fixing", "validation_step": "prompt should include a verification/validation step after the fix"}, "weight": 1.5}
2
+
{"id": "cond-002", "name": "feature_issue_guides_implementation", "prompt": "You are an AI coding agent given this prompt template output:\n\n---\nYou are working on Linear issue STA-200: Add tag filtering to frame search\n\n## Description\n\nUsers should be able to filter frames by tags in the search API. Add a `tags` parameter (string[]) to the search method that filters results to only include frames with matching tags. Use FTS5 for the text search portion.\n\nLabels: feature, search\nPriority: Medium\n\n## Instructions\n\n1. Read the issue description carefully\n2. Implement the requested changes\n3. Write or update tests as needed\n4. Run lint and tests to verify\n5. Commit your changes with a descriptive message\n\nWork in the current directory. All changes will be on a dedicated branch.\n---\n\nEvaluate whether this prompt adequately guides a feature implementation. Check for: implementation guidance, test requirements, code quality expectations, commit conventions, and whether it tells the agent to check existing patterns before writing new code.", "expected": {"implementation_guidance": "prompt should guide the agent through implementation steps", "test_requirements": "prompt should specify writing tests for the new feature", "existing_patterns": "prompt should tell agent to follow existing code patterns", "code_quality": "prompt should mention lint, type safety, or code quality checks", "commit_conventions": "prompt should specify commit message format"}, "weight": 1.5}
3
+
{"id": "cond-003", "name": "retry_context_handling", "prompt": "You are an AI coding agent given this prompt template output for a retry attempt:\n\n---\nYou are working on Linear issue STA-305: Migrate config schema to v3\n\n## Description\n\nUpdate the config file schema from v2 to v3. Add the new 'integrations' top-level key and migrate existing linear settings under it.\n\nLabels: chore\nPriority: Medium\n\nThis is attempt 2. Check .stackmemory/conductor-context.md for context from prior attempts.\n\n## Instructions\n\n1. Read the issue description carefully\n2. Implement the requested changes\n3. Write or update tests as needed\n4. Run lint and tests to verify\n5. Commit your changes with a descriptive message\n\nWork in the current directory. All changes will be on a dedicated branch.\n---\n\nEvaluate whether this retry prompt adequately guides the agent on attempt 2. Check: does it tell the agent to read prior context first, does it explain what might have gone wrong, does it suggest a different approach, does it include enough context about what was already tried?", "expected": {"reads_prior_context": "prompt should instruct agent to read prior attempt context before starting", "avoids_repeating_mistakes": "prompt should guide agent to understand what failed previously", "different_approach": "prompt should suggest trying a different approach if prior attempt failed", "preserves_prior_work": "prompt should tell agent to check if partial work exists from prior attempt"}, "weight": 2.0}
4
+
{"id": "cond-004", "name": "commit_format_guidance", "prompt": "You are an AI coding agent given this prompt template output:\n\n---\nYou are working on Linear issue STA-180: Add rate limiting to webhook endpoint\n\n## Description\n\nAdd rate limiting (100 req/min per IP) to the webhook handler to prevent abuse.\n\nLabels: feature, security\nPriority: High\n\n## Instructions\n\n1. Read the issue description carefully\n2. Implement the requested changes\n3. Write or update tests as needed\n4. Run lint and tests to verify\n5. Commit your changes with a descriptive message\n\nWork in the current directory. All changes will be on a dedicated branch.\n---\n\nEvaluate whether this prompt gives adequate commit guidance. Check: does it specify the commit message format (e.g. feat(scope): message), does it mention including the Linear issue ID, does it tell the agent to make atomic commits, does it specify branch naming conventions?", "expected": {"commit_format_specified": "prompt should specify type(scope): message format", "includes_issue_id": "prompt should tell agent to reference the Linear issue ID in commits", "atomic_commits": "prompt should guide making focused, atomic commits", "branch_conventions": "prompt should mention branch naming or confirm branch is set up"}, "weight": 1.3}
5
+
{"id": "cond-005", "name": "no_description_handling", "prompt": "You are an AI coding agent given this prompt template output for an issue with no description:\n\n---\nYou are working on Linear issue STA-410: Fix typo in error message\n\nLabels: \nPriority: Low\n\n## Instructions\n\n1. Read the issue description carefully\n2. Implement the requested changes\n3. Write or update tests as needed\n4. Run lint and tests to verify\n5. Commit your changes with a descriptive message\n\nWork in the current directory. All changes will be on a dedicated branch.\n---\n\nEvaluate how well this prompt handles an issue with no description and no labels. Check: does the prompt degrade gracefully without a description, does it guide the agent to search the codebase for context, does it handle empty labels cleanly, does it still provide useful instructions despite minimal input?", "expected": {"graceful_without_description": "prompt should still be useful without a description section", "guides_codebase_search": "prompt should tell agent to search codebase for relevant context when description is missing", "handles_empty_labels": "empty labels should not produce awkward formatting", "minimal_input_useful": "prompt should provide enough structure even with minimal issue data"}, "weight": 1.8}
6
+
{"id": "cond-006", "name": "urgent_priority_handling", "prompt": "You are an AI coding agent given this prompt template output for an urgent issue:\n\n---\nYou are working on Linear issue STA-501: Production crash in webhook handler\n\n## Description\n\nThe webhook handler is crashing in production with an unhandled promise rejection when the Linear API returns a 502. This is blocking all issue syncing. Error: UnhandledPromiseRejection at webhookHandler:45\n\nLabels: bug, production\nPriority: Urgent\n\n## Instructions\n\n1. Read the issue description carefully\n2. Implement the requested changes\n3. Write or update tests as needed\n4. Run lint and tests to verify\n5. Commit your changes with a descriptive message\n\nWork in the current directory. All changes will be on a dedicated branch.\n---\n\nEvaluate whether this prompt handles urgent/production issues appropriately. Check: does it convey urgency, does it guide the agent to prioritize a fix over perfect code, does it suggest checking for similar issues, does it emphasize testing the error path specifically?", "expected": {"conveys_urgency": "prompt should differentiate urgent issues from normal priority", "fix_over_perfection": "prompt should guide agent to prioritize a working fix for urgent issues", "error_path_testing": "prompt should emphasize testing the specific error scenario", "production_awareness": "prompt should include guidance about production-impacting changes"}, "weight": 1.5}
7
+
{"id": "cond-007", "name": "template_variable_completeness", "prompt": "Review this conductor prompt template for completeness:\n\n---\nYou are working on Linear issue {{ISSUE_ID}}: {{TITLE}}\n\n## Description\n\n{{DESCRIPTION}}\n\nLabels: {{LABELS}}\nPriority: {{PRIORITY}}\n\n{{PRIOR_CONTEXT}}\n\n## Instructions\n\n1. Read the issue description carefully\n2. Implement the requested changes\n3. Write or update tests as needed\n4. Run lint and tests to verify\n5. Commit your changes with a descriptive message\n\nWork in the current directory. All changes will be on a dedicated branch.\n---\n\nEvaluate this template's completeness. Check: are all template variables used, does it include project-specific commands (npm run lint, npm run test:run), does it specify coding conventions, does it mention the commit format, does it handle the case where DESCRIPTION or LABELS might be empty?", "expected": {"all_variables_used": "template should use all available variables (ISSUE_ID, TITLE, DESCRIPTION, LABELS, PRIORITY, ATTEMPT, PRIOR_CONTEXT)", "project_commands": "template should include specific commands like npm run lint, npm run test:run, npm run build", "coding_conventions": "template should reference coding conventions or link to CLAUDE.md", "empty_variable_handling": "template should handle empty DESCRIPTION or LABELS gracefully"}, "weight": 1.3}
console.error(`Error: ${claudeMdPath} not found`);
@@ -183,6 +217,10 @@ async function generateMutation(content, strategy, state) {
183
217
add_constraints: `Add specific constraints and guardrails based on common failure modes. Be precise about what NOT to do.`,
184
218
185
219
simplify: `Simplify complex instructions. Break down multi-step rules. Use bullet points over paragraphs.`,
220
+
221
+
add_guardrails: `Add guardrails for common agent failure modes: forgetting to run tests, wrong commit format, not reading prior context on retries, not handling empty fields. Add explicit "DO NOT" rules where agents commonly go wrong.`,
222
+
223
+
improve_error_handling: `Improve how the prompt handles edge cases and errors: empty descriptions, missing labels, retry attempts, urgent priorities. Add conditional sections and fallback instructions for when data is incomplete.`,
186
224
};
187
225
188
226
constprompt=`You are optimizing a CLAUDE.md system prompt for an AI coding agent.
@@ -386,10 +424,10 @@ async function runEval(variantName) {
386
424
387
425
console.log(`Running evals on ${variantName}...`);
388
426
389
-
// Load eval tasks
390
-
constevalFiles=fs
391
-
.readdirSync(EVALS_DIR)
392
-
.filter((f)=>f.endsWith('.jsonl'));
427
+
// Load eval tasks (use profile-specific files if set, otherwise all .jsonl)
0 commit comments