Duplicate Issue Detection and labeler as Possible Duplicate using semantic analysis#71
Duplicate Issue Detection and labeler as Possible Duplicate using semantic analysis#71aniket866 wants to merge 2 commits intoAOSSIE-Org:mainfrom
Possible Duplicate using semantic analysis#71Conversation
|
Warning Ignoring CodeRabbit configuration file changes. For security, only the configuration from the base branch is applied for open source repositories. WalkthroughAdded duplicate detection configuration to the CodeRabbit settings file under issue_enrichment. The new configuration enables automatic duplicate detection with a 0.8 similarity threshold and specifies comment action with duplicate labeling. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.coderabbit.yaml:
- Line 36: The .coderabbit.yaml file sets similarity_threshold to 0.8 which
conflicts with the PR description that states 0.9; decide which value is
intended and make them consistent by either updating similarity_threshold in
.coderabbit.yaml to 0.9 or updating the PR description to 0.8, then mention the
change in the PR so reviewers know which source of truth (the
similarity_threshold key) was chosen.
- Line 36: The committed config sets similarity_threshold: 0.8 but the PR
description expects 0.9; update the configuration key similarity_threshold in
.coderabbit.yaml to match the PR description by changing its value to 0.9 so the
code and PR text are consistent and reviewers aren’t confused.
- Around line 34-38: The duplicate_detection block is invalid and will be
ignored; remove the entire duplicate_detection: { enabled, similarity_threshold,
action, label } block and instead configure duplicate handling via the supported
issue_enrichment schema (use issue_enrichment.labeling.labeling_instructions to
provide rules/instructions that generate a "duplicate" label or comments and
rely on the built-in duplicate detection), ensuring only the allowed keys
auto_enrich, planning, and labeling remain under issue_enrichment.
- Around line 34-38: Replace the invalid duplicate_detection block under
issue_enrichment with the supported auto_enrich configuration: remove the
duplicate_detection: ... similarity_threshold/action/label keys and add an
auto_enrich block with enabled: true (i.e., set
issue_enrichment.auto_enrich.enabled to true) so CodeRabbit will handle
duplicate detection; if further customization is needed, add
planning/auto_planning/labeling as separate keys under issue_enrichment.
| duplicate_detection: | ||
| enabled: true | ||
| similarity_threshold: 0.8 | ||
| action: comment | ||
| label: duplicate |
There was a problem hiding this comment.
duplicate_detection is not a recognized configuration key — this entire block will be silently ignored.
The authoritative CodeRabbit configuration schema (provided in this review context) defines issue_enrichment with "additionalProperties": false, allowing only three child keys: auto_enrich, planning, and labeling. The duplicate_detection key does not exist in the schema, so CodeRabbit will reject or silently ignore it.
CodeRabbit analyzes issues against existing issues in its knowledge base to detect potential duplicates — however, the enrichment format is standardized, and customization is available for label categories and auto-planning, not for a duplicate_detection sub-block. There is no documented YAML configuration key for controlling duplicate detection behavior; it appears to be a built-in automatic feature of issue enrichment, not something toggled or parameterized through the config file.
The sub-fields similarity_threshold, action, and label have no schema backing and will produce no effect.
🗑️ Proposed fix: remove the unsupported block
- duplicate_detection:
- enabled: true
- similarity_threshold: 0.8
- action: comment
- label: duplicate
-If the goal is to have issues auto-labeled as duplicate, use the supported issue_enrichment.labeling.labeling_instructions mechanism instead:
labeling:
auto_apply_labels: true
labeling_instructions:
...
+ - label: duplicate
+ instructions: Apply when the issue appears to be a duplicate of an existing open or closed issue in the repository.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| duplicate_detection: | |
| enabled: true | |
| similarity_threshold: 0.8 | |
| action: comment | |
| label: duplicate |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.coderabbit.yaml around lines 34 - 38, The duplicate_detection block is
invalid and will be ignored; remove the entire duplicate_detection: { enabled,
similarity_threshold, action, label } block and instead configure duplicate
handling via the supported issue_enrichment schema (use
issue_enrichment.labeling.labeling_instructions to provide rules/instructions
that generate a "duplicate" label or comments and rely on the built-in duplicate
detection), ensuring only the allowed keys auto_enrich, planning, and labeling
remain under issue_enrichment.
🧩 Analysis chain
🏁 Script executed:
# Find schema files related to .coderabbit.yaml
fd -t f -e json -e yaml | grep -i schema | head -20Repository: AOSSIE-Org/Template-Repo
Length of output: 50
🏁 Script executed:
# Search for .coderabbit.yaml schema or validation rules
rg "issue_enrichment" --type yaml --type json -A 5 -B 5 | head -100Repository: AOSSIE-Org/Template-Repo
Length of output: 708
🏁 Script executed:
# Search for duplicate_detection references in the codebase
rg "duplicate_detection" --type yaml --type json -B 3 -A 3Repository: AOSSIE-Org/Template-Repo
Length of output: 383
🏁 Script executed:
# Look for any configuration schema documentation files
fd -t f \( -name "*schema*" -o -name "*config*" \) | grep -E "\.(json|yaml|md)$" | head -20Repository: AOSSIE-Org/Template-Repo
Length of output: 298
🏁 Script executed:
# Search for schema files more broadly, including node_modules and any json files
find . -maxdepth 5 -type f -name "*.json" | grep -i "schema\|config" | head -30Repository: AOSSIE-Org/Template-Repo
Length of output: 50
🏁 Script executed:
# Look for any references to issue_enrichment in schema/type definition files
rg "additionalProperties" --type json -B 5 -A 5 | head -100Repository: AOSSIE-Org/Template-Repo
Length of output: 216
🏁 Script executed:
# Check if there's documentation or README about coderabbit configuration
find . -type f \( -name "README*" -o -name "CONTRIBUTING*" -o -name "*.md" \) -exec grep -l "coderabbit\|duplicate" {} \; | head -10Repository: AOSSIE-Org/Template-Repo
Length of output: 85
🌐 Web query:
CodeRabbit duplicate_detection issue_enrichment feature official documentation
💡 Result:
CodeRabbit’s official documentation for duplicate_detection is under Issue Enrichment:
- Issue Enrichment (open beta): automatically comments on GitHub Issues with Duplicate Detection, similar issues, related PRs, suggested assignees, and smart labeling. Duplicate Detection analyzes a new/edited issue against existing issues (and the repo knowledge base) and lists “Possible Duplicates.” [1]
- Disable auto-enrichment in
.coderabbit.yaml:(The configuration reference documentsissue_enrichment: auto_enrich: enabled: false
issue_enrichment.auto_enrich.enabled.) [1], [2]
Sources: [1] CodeRabbit Docs – Issue Enrichment, [2] CodeRabbit Docs – Configuration Reference (Issue enrichment section.
🌐 Web query:
CodeRabbit .coderabbit.yaml issue_enrichment duplicate_detection configuration example syntax
💡 Result:
CodeRabbit’s duplicate detection is part of Issue Enrichment and (as of the docs updated 2026-02-11) there is no separate duplicate_detection: block in .coderabbit.yaml; you enable/disable it via issue_enrichment.auto_enrich.enabled. [1], [2]
Minimal .coderabbit.yaml examples
Disable duplicate detection (and all automatic issue enrichment):
issue_enrichment:
auto_enrich:
enabled: false[1], [2]
Enable automatic issue enrichment (duplicate detection included):
issue_enrichment:
auto_enrich:
enabled: true(Uses the same key; enabling enrichment enables duplicate detection.) [1], [2]
Related (optional) issue-enrichment configuration you can set
Example: auto-apply labels + label instructions:
issue_enrichment:
labeling:
auto_apply_labels: true
labeling_instructions:
- label: bug
instructions: "Issues reporting bugs, errors, crashes, regressions, etc."
- label: enhancement
instructions: "Feature requests and improvements."[1], [2]
If you want schema validation, CodeRabbit also documents using a YAML schema header (example shown in their YAML configuration guide). [3]
Sources: Issue Enrichment docs (incl. disable snippet) [1], configuration reference (Issue enrichment section) [2], YAML configuration guide [3].
🌐 Web query:
CodeRabbit configuration schema JSON issue_enrichment properties allowed keys
💡 Result:
In the CodeRabbit configuration schema, issue_enrichment is an object whose allowed top-level keys are: auto_enrich, planning, auto_planning, and labeling. [1]
Allowed keys under each:
issue_enrichment.auto_enrich:enabled[1][2]issue_enrichment.planning:enabled[1]issue_enrichment.auto_planning:enabled,labels[1]issue_enrichment.labeling:auto_apply_labels,labeling_instructions[1][2]issue_enrichment.labeling.labeling_instructions[](array items):label,instructions[1][2]
Sources: [1] [2]
🏁 Script executed:
# Check the CONTRIBUTING.md file for coderabbit config information
cat ./CONTRIBUTING.md | head -100Repository: AOSSIE-Org/Template-Repo
Length of output: 2871
Remove the invalid duplicate_detection block — CodeRabbit does not support this configuration syntax.
According to CodeRabbit's official documentation, duplicate_detection is not a valid key under issue_enrichment. The allowed properties are: auto_enrich, planning, auto_planning, and labeling. Duplicate detection is controlled exclusively through issue_enrichment.auto_enrich.enabled.
Replace lines 34-38 with:
auto_enrich:
enabled: trueThis enables all automatic issue enrichment features, including duplicate detection. If you need further customization (labeling, planning, etc.), add those as separate top-level keys under issue_enrichment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.coderabbit.yaml around lines 34 - 38, Replace the invalid
duplicate_detection block under issue_enrichment with the supported auto_enrich
configuration: remove the duplicate_detection: ...
similarity_threshold/action/label keys and add an auto_enrich block with
enabled: true (i.e., set issue_enrichment.auto_enrich.enabled to true) so
CodeRabbit will handle duplicate detection; if further customization is needed,
add planning/auto_planning/labeling as separate keys under issue_enrichment.
|
|
||
| duplicate_detection: | ||
| enabled: true | ||
| similarity_threshold: 0.8 |
There was a problem hiding this comment.
similarity_threshold value conflicts with the PR description.
The PR description specifies similarity_threshold: 0.9, but the committed value is 0.8. Decide on the intended threshold and update either the config or the PR description to keep them in sync.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.coderabbit.yaml at line 36, The .coderabbit.yaml file sets
similarity_threshold to 0.8 which conflicts with the PR description that states
0.9; decide which value is intended and make them consistent by either updating
similarity_threshold in .coderabbit.yaml to 0.9 or updating the PR description
to 0.8, then mention the change in the PR so reviewers know which source of
truth (the similarity_threshold key) was chosen.
similarity_threshold value (0.8) conflicts with the PR description (0.9).
The PR description specifies similarity_threshold: 0.9, but the committed value is 0.8. Regardless of the schema violation above, reconcile the two to avoid ambiguity for reviewers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.coderabbit.yaml at line 36, The committed config sets similarity_threshold:
0.8 but the PR description expects 0.9; update the configuration key
similarity_threshold in .coderabbit.yaml to match the PR description by changing
its value to 0.9 so the code and PR text are consistent and reviewers aren’t
confused.
Addressed Issues:
Issue #65
Screenshots/Recordings:
Checklist
We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact.
Summary by CodeRabbit