Skip to content

Duplicate Issue Detection and labeler as Possible Duplicate using semantic analysis#71

Closed
aniket866 wants to merge 2 commits intoAOSSIE-Org:mainfrom
aniket866:patch-1
Closed

Duplicate Issue Detection and labeler as Possible Duplicate using semantic analysis#71
aniket866 wants to merge 2 commits intoAOSSIE-Org:mainfrom
aniket866:patch-1

Conversation

@aniket866
Copy link

@aniket866 aniket866 commented Feb 18, 2026

Addressed Issues:

Issue #65

Screenshots/Recordings:

  • to be produced
  duplicate_detection:
    enabled: true
    similarity_threshold: 0.9
    action: comment

Checklist

  • My code follows the project's code style and conventions
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • I have read the Contributing Guidelines

⚠️ AI Notice - Important!

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact.

Summary by CodeRabbit

  • New Features
    • Enabled automatic duplicate detection for issues. Duplicate issues will now be automatically identified, labeled, and receive an automated comment, improving project organization and reducing manual duplicate handling.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

Warning

Ignoring CodeRabbit configuration file changes. For security, only the configuration from the base branch is applied for open source repositories.

Walkthrough

Added duplicate detection configuration to the CodeRabbit settings file under issue_enrichment. The new configuration enables automatic duplicate detection with a 0.8 similarity threshold and specifies comment action with duplicate labeling.

Changes

Cohort / File(s) Summary
Configuration Update
.coderabbit.yaml
Added duplicate_detection configuration block under issue_enrichment with similarity_threshold set to 0.8, action set to comment, and label set to duplicate.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Suggested labels

Documentation

Poem

A rabbit configured with precision and care, 🐰
Duplicate detection floats through the air,
At 0.8 similarity, twins are found true,
Comments and labels bloom bright and new,
Less chaos, more order—hooray! ✨

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding duplicate issue detection with semantic analysis and labeling. It directly matches the core functionality introduced in the configuration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.coderabbit.yaml:
- Line 36: The .coderabbit.yaml file sets similarity_threshold to 0.8 which
conflicts with the PR description that states 0.9; decide which value is
intended and make them consistent by either updating similarity_threshold in
.coderabbit.yaml to 0.9 or updating the PR description to 0.8, then mention the
change in the PR so reviewers know which source of truth (the
similarity_threshold key) was chosen.
- Line 36: The committed config sets similarity_threshold: 0.8 but the PR
description expects 0.9; update the configuration key similarity_threshold in
.coderabbit.yaml to match the PR description by changing its value to 0.9 so the
code and PR text are consistent and reviewers aren’t confused.
- Around line 34-38: The duplicate_detection block is invalid and will be
ignored; remove the entire duplicate_detection: { enabled, similarity_threshold,
action, label } block and instead configure duplicate handling via the supported
issue_enrichment schema (use issue_enrichment.labeling.labeling_instructions to
provide rules/instructions that generate a "duplicate" label or comments and
rely on the built-in duplicate detection), ensuring only the allowed keys
auto_enrich, planning, and labeling remain under issue_enrichment.
- Around line 34-38: Replace the invalid duplicate_detection block under
issue_enrichment with the supported auto_enrich configuration: remove the
duplicate_detection: ... similarity_threshold/action/label keys and add an
auto_enrich block with enabled: true (i.e., set
issue_enrichment.auto_enrich.enabled to true) so CodeRabbit will handle
duplicate detection; if further customization is needed, add
planning/auto_planning/labeling as separate keys under issue_enrichment.

Comment on lines +34 to +38
duplicate_detection:
enabled: true
similarity_threshold: 0.8
action: comment
label: duplicate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

duplicate_detection is not a recognized configuration key — this entire block will be silently ignored.

The authoritative CodeRabbit configuration schema (provided in this review context) defines issue_enrichment with "additionalProperties": false, allowing only three child keys: auto_enrich, planning, and labeling. The duplicate_detection key does not exist in the schema, so CodeRabbit will reject or silently ignore it.

CodeRabbit analyzes issues against existing issues in its knowledge base to detect potential duplicates — however, the enrichment format is standardized, and customization is available for label categories and auto-planning, not for a duplicate_detection sub-block. There is no documented YAML configuration key for controlling duplicate detection behavior; it appears to be a built-in automatic feature of issue enrichment, not something toggled or parameterized through the config file.

The sub-fields similarity_threshold, action, and label have no schema backing and will produce no effect.

🗑️ Proposed fix: remove the unsupported block
-  duplicate_detection:
-    enabled: true
-    similarity_threshold: 0.8
-    action: comment
-    label: duplicate
-

If the goal is to have issues auto-labeled as duplicate, use the supported issue_enrichment.labeling.labeling_instructions mechanism instead:

   labeling:
     auto_apply_labels: true
     labeling_instructions:
       ...
+      - label: duplicate
+        instructions: Apply when the issue appears to be a duplicate of an existing open or closed issue in the repository.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
duplicate_detection:
enabled: true
similarity_threshold: 0.8
action: comment
label: duplicate
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.coderabbit.yaml around lines 34 - 38, The duplicate_detection block is
invalid and will be ignored; remove the entire duplicate_detection: { enabled,
similarity_threshold, action, label } block and instead configure duplicate
handling via the supported issue_enrichment schema (use
issue_enrichment.labeling.labeling_instructions to provide rules/instructions
that generate a "duplicate" label or comments and rely on the built-in duplicate
detection), ensuring only the allowed keys auto_enrich, planning, and labeling
remain under issue_enrichment.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find schema files related to .coderabbit.yaml
fd -t f -e json -e yaml | grep -i schema | head -20

Repository: AOSSIE-Org/Template-Repo

Length of output: 50


🏁 Script executed:

# Search for .coderabbit.yaml schema or validation rules
rg "issue_enrichment" --type yaml --type json -A 5 -B 5 | head -100

Repository: AOSSIE-Org/Template-Repo

Length of output: 708


🏁 Script executed:

# Search for duplicate_detection references in the codebase
rg "duplicate_detection" --type yaml --type json -B 3 -A 3

Repository: AOSSIE-Org/Template-Repo

Length of output: 383


🏁 Script executed:

# Look for any configuration schema documentation files
fd -t f \( -name "*schema*" -o -name "*config*" \) | grep -E "\.(json|yaml|md)$" | head -20

Repository: AOSSIE-Org/Template-Repo

Length of output: 298


🏁 Script executed:

# Search for schema files more broadly, including node_modules and any json files
find . -maxdepth 5 -type f -name "*.json" | grep -i "schema\|config" | head -30

Repository: AOSSIE-Org/Template-Repo

Length of output: 50


🏁 Script executed:

# Look for any references to issue_enrichment in schema/type definition files
rg "additionalProperties" --type json -B 5 -A 5 | head -100

Repository: AOSSIE-Org/Template-Repo

Length of output: 216


🏁 Script executed:

# Check if there's documentation or README about coderabbit configuration
find . -type f \( -name "README*" -o -name "CONTRIBUTING*" -o -name "*.md" \) -exec grep -l "coderabbit\|duplicate" {} \; | head -10

Repository: AOSSIE-Org/Template-Repo

Length of output: 85


🌐 Web query:

CodeRabbit duplicate_detection issue_enrichment feature official documentation

💡 Result:

CodeRabbit’s official documentation for duplicate_detection is under Issue Enrichment:

  • Issue Enrichment (open beta): automatically comments on GitHub Issues with Duplicate Detection, similar issues, related PRs, suggested assignees, and smart labeling. Duplicate Detection analyzes a new/edited issue against existing issues (and the repo knowledge base) and lists “Possible Duplicates.” [1]
  • Disable auto-enrichment in .coderabbit.yaml:
    issue_enrichment:
      auto_enrich:
        enabled: false
    (The configuration reference documents issue_enrichment.auto_enrich.enabled.) [1], [2]

Sources: [1] CodeRabbit Docs – Issue Enrichment, [2] CodeRabbit Docs – Configuration Reference (Issue enrichment section.


🌐 Web query:

CodeRabbit .coderabbit.yaml issue_enrichment duplicate_detection configuration example syntax

💡 Result:

CodeRabbit’s duplicate detection is part of Issue Enrichment and (as of the docs updated 2026-02-11) there is no separate duplicate_detection: block in .coderabbit.yaml; you enable/disable it via issue_enrichment.auto_enrich.enabled. [1], [2]

Minimal .coderabbit.yaml examples

Disable duplicate detection (and all automatic issue enrichment):

issue_enrichment:
  auto_enrich:
    enabled: false

[1], [2]

Enable automatic issue enrichment (duplicate detection included):

issue_enrichment:
  auto_enrich:
    enabled: true

(Uses the same key; enabling enrichment enables duplicate detection.) [1], [2]

Related (optional) issue-enrichment configuration you can set

Example: auto-apply labels + label instructions:

issue_enrichment:
  labeling:
    auto_apply_labels: true
    labeling_instructions:
      - label: bug
        instructions: "Issues reporting bugs, errors, crashes, regressions, etc."
      - label: enhancement
        instructions: "Feature requests and improvements."

[1], [2]

If you want schema validation, CodeRabbit also documents using a YAML schema header (example shown in their YAML configuration guide). [3]

Sources: Issue Enrichment docs (incl. disable snippet) [1], configuration reference (Issue enrichment section) [2], YAML configuration guide [3].


🌐 Web query:

CodeRabbit configuration schema JSON issue_enrichment properties allowed keys

💡 Result:

In the CodeRabbit configuration schema, issue_enrichment is an object whose allowed top-level keys are: auto_enrich, planning, auto_planning, and labeling. [1]

Allowed keys under each:

  • issue_enrichment.auto_enrich: enabled [1][2]
  • issue_enrichment.planning: enabled [1]
  • issue_enrichment.auto_planning: enabled, labels [1]
  • issue_enrichment.labeling: auto_apply_labels, labeling_instructions [1][2]
    • issue_enrichment.labeling.labeling_instructions[] (array items): label, instructions [1][2]

Sources: [1] [2]


🏁 Script executed:

# Check the CONTRIBUTING.md file for coderabbit config information
cat ./CONTRIBUTING.md | head -100

Repository: AOSSIE-Org/Template-Repo

Length of output: 2871


Remove the invalid duplicate_detection block — CodeRabbit does not support this configuration syntax.

According to CodeRabbit's official documentation, duplicate_detection is not a valid key under issue_enrichment. The allowed properties are: auto_enrich, planning, auto_planning, and labeling. Duplicate detection is controlled exclusively through issue_enrichment.auto_enrich.enabled.

Replace lines 34-38 with:

  auto_enrich:
    enabled: true

This enables all automatic issue enrichment features, including duplicate detection. If you need further customization (labeling, planning, etc.), add those as separate top-level keys under issue_enrichment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.coderabbit.yaml around lines 34 - 38, Replace the invalid
duplicate_detection block under issue_enrichment with the supported auto_enrich
configuration: remove the duplicate_detection: ...
similarity_threshold/action/label keys and add an auto_enrich block with
enabled: true (i.e., set issue_enrichment.auto_enrich.enabled to true) so
CodeRabbit will handle duplicate detection; if further customization is needed,
add planning/auto_planning/labeling as separate keys under issue_enrichment.


duplicate_detection:
enabled: true
similarity_threshold: 0.8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

similarity_threshold value conflicts with the PR description.

The PR description specifies similarity_threshold: 0.9, but the committed value is 0.8. Decide on the intended threshold and update either the config or the PR description to keep them in sync.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.coderabbit.yaml at line 36, The .coderabbit.yaml file sets
similarity_threshold to 0.8 which conflicts with the PR description that states
0.9; decide which value is intended and make them consistent by either updating
similarity_threshold in .coderabbit.yaml to 0.9 or updating the PR description
to 0.8, then mention the change in the PR so reviewers know which source of
truth (the similarity_threshold key) was chosen.

⚠️ Potential issue | 🟡 Minor

similarity_threshold value (0.8) conflicts with the PR description (0.9).

The PR description specifies similarity_threshold: 0.9, but the committed value is 0.8. Regardless of the schema violation above, reconcile the two to avoid ambiguity for reviewers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.coderabbit.yaml at line 36, The committed config sets similarity_threshold:
0.8 but the PR description expects 0.9; update the configuration key
similarity_threshold in .coderabbit.yaml to match the PR description by changing
its value to 0.9 so the code and PR text are consistent and reviewers aren’t
confused.

@aniket866 aniket866 closed this Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments