Skip to content

Ignore copyright symbols inside URLs during copyright detection#4744

Open
dikshaa2909 wants to merge 4 commits intoaboutcode-org:developfrom
dikshaa2909:develop
Open

Ignore copyright symbols inside URLs during copyright detection#4744
dikshaa2909 wants to merge 4 commits intoaboutcode-org:developfrom
dikshaa2909:develop

Conversation

@dikshaa2909
Copy link

@dikshaa2909 dikshaa2909 commented Feb 13, 2026

Fixes #4724

Summary

This PR fixes a false-positive issue where ScanCode detects copyright statements when a copyright symbol (c) appears inside a URL.

URLs such as:
http://example.com/(c)/path

were incorrectly treated as copyright candidates, even though they are not copyright statements.


Problem

ScanCode’s copyright candidate detection logic treated (c) inside URLs as a valid copyright marker.
This resulted in incorrect detections when scanning text files containing URLs with (c) in the path.


Solution

  • Updated the copyright candidate detection logic to ignore copyright markers appearing inside URLs
  • Added a regression test to ensure this behavior does not regress in the future

Tests

  • Added test_copyright_symbol_inside_url_is_ignored
  • All existing tests pass successfully

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    (Tests also run locally)
  • Commits are in a uniquely-named feature branch and have no merge conflicts 📁
  • Updated documentation pages (not applicable)
  • Updated CHANGELOG.rst (not applicable)

Signed of by dikshadeware@gmail.com

@dikshaa2909 dikshaa2909 force-pushed the develop branch 2 times, most recently from 12b4a11 to 8f51b45 Compare February 13, 2026 20:49
Signed-off-by: dikshaa2909 <dikshadeware@gmail.com>
Refactor copyright symbol detection to ignore (c) only in URL paths.

Signed-off-by: dikshaa2909 <dikshadeware@gmail.com>
Add test for copyright detection with URL

Signed-off-by: dikshaa2909 <dikshadeware@gmail.com>
Signed-off-by: dikshaa2909 <dikshadeware@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Copyright detection sees URLs containing copyright symbols as copyright statements

1 participant