Skip to content

Conversation

@GYFX35
Copy link
Owner

@GYFX35 GYFX35 commented Sep 6, 2025

This commit refactors the existing Facebook and scam analyzers into a single, generic social media analyzer.

The new social_media_analyzer supports the following platforms:

  • Facebook
  • Instagram
  • WhatsApp
  • TikTok
  • Tinder
  • Snapchat
  • WeChat

The fake profile detector and scam message analyzer have been generalized to be platform-aware. The user is now prompted to select a platform before performing an analysis.

The old facebook_analyzer and scam_detector directories have been removed.

Summary by Sourcery

Refactor existing Facebook and scam analyzers into a single generic social media analyzer supporting seven platforms, generalizing fake profile and scam message detection under one CLI and cleaning up deprecated code

New Features:

  • Introduce a unified social_media_analyzer CLI to select platform and analysis type
  • Add fake_profile_detector with interactive checklist, platform-specific advice, and weighted indicators
  • Implement scam_detector with heuristic-based text and URL analysis, including platform-aware domain whitelisting

Enhancements:

  • Consolidate domain lists, keyword patterns, and scoring weights into a shared heuristics module

Chores:

  • Remove legacy facebook_analyzer and scam_detector directories and files

This commit refactors the existing Facebook and scam analyzers into a single, generic social media analyzer.

The new `social_media_analyzer` supports the following platforms:
- Facebook
- Instagram
- WhatsApp
- TikTok
- Tinder
- Snapchat
- WeChat

The fake profile detector and scam message analyzer have been generalized to be platform-aware. The user is now prompted to select a platform before performing an analysis.

The old `facebook_analyzer` and `scam_detector` directories have been removed.
@sourcery-ai
Copy link

sourcery-ai bot commented Sep 6, 2025

Reviewer's Guide

Refactors and unifies the Facebook and scam analyzers into a single social_media_analyzer package that supports seven platforms by consolidating shared logic, introducing platform-aware heuristics, and providing a unified CLI interface.

Entity relationship diagram for platform-specific advice and legitimate domains

erDiagram
    PLATFORM_SPECIFIC_ADVICE {
        string platform
        list advice
    }
    LEGITIMATE_DOMAINS {
        string platform
        list domains
    }
    PLATFORM_SPECIFIC_ADVICE ||--|{ LEGITIMATE_DOMAINS : "platform"
Loading

Class diagram for the new social_media_analyzer package

classDiagram
    class fake_profile_detector {
        +analyze_profile_based_on_user_input(profile_url, platform)
        +guide_reverse_image_search(image_url=None)
        +print_platform_specific_advice(platform)
        PLATFORM_SPECIFIC_ADVICE : dict
        FAKE_PROFILE_INDICATORS : list
    }
    class scam_detector {
        +analyze_text_for_scams(text_content, platform=None)
        +is_url_suspicious(url, platform=None)
        +get_domain_from_url(url)
        +get_legitimate_domains(platform=None)
    }
    class heuristics {
        LEGITIMATE_DOMAINS : dict
        URGENCY_KEYWORDS : list
        SENSITIVE_INFO_KEYWORDS : list
        TOO_GOOD_TO_BE_TRUE_KEYWORDS : list
        GENERIC_GREETINGS : list
        TECH_SUPPORT_SCAM_KEYWORDS : list
        PAYMENT_KEYWORDS : list
        URL_PATTERN : regex
        SUSPICIOUS_TLDS : list
        CRYPTO_ADDRESS_PATTERNS : dict
        PHONE_NUMBER_PATTERN : regex
        SUSPICIOUS_URL_PATTERNS : list
        HEURISTIC_WEIGHTS : dict
    }
    class main {
        +main()
    }
    fake_profile_detector --|> heuristics : uses
    scam_detector --|> heuristics : uses
    main --> fake_profile_detector : imports
    main --> scam_detector : imports
Loading

File-Level Changes

Change Details Files
Combine Facebook and scam analyzers into a single generic social_media_analyzer package
  • Added a main CLI (main.py) for platform selection and analysis type
  • Reorganized analyzer code under social_media_analyzer/ directory
  • Removed legacy facebook_analyzer and scam_detector directories and entry points
social_media_analyzer/main.py
facebook_analyzer/__init__.py
facebook_analyzer/fake_profile_detector.py
facebook_analyzer/phishing_detector.py
scam_main.py
scam_detector/__init__.py
scam_detector/analyzer.py
scam_detector/heuristics.py
Generalize fake profile detection to be platform-aware
  • Created a unified fake_profile_detector with PLATFORM_SPECIFIC_ADVICE mapping for each platform
  • Consolidated generic FAKE_PROFILE_INDICATORS list for manual checklist
  • Implemented guided reverse image search and interactive prompt in analyze_profile_based_on_user_input
social_media_analyzer/fake_profile_detector.py
Generalize scam message analyzer with platform-aware URL and keyword heuristics
  • Added get_legitimate_domains() and extended is_url_suspicious() to consider platform-specific domains
  • Enhanced analyze_text_for_scams() to use shared heuristics and report URL analysis details
  • Imported centralized heuristics module for patterns, keywords, and weights
social_media_analyzer/scam_detector.py
Extract and expand heuristics into a dedicated module
  • Defined LEGITIMATE_DOMAINS per social media platform
  • Centralized all keyword lists, regex patterns, and suspicious TLD definitions
  • Configured HEURISTIC_WEIGHTS for scoring various scam indicators
social_media_analyzer/heuristics.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@GYFX35 GYFX35 merged commit 0b2b5d7 into main Sep 6, 2025
2 of 6 checks passed
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `social_media_analyzer/scam_detector.py:29` </location>
<code_context>
-    url_pattern = r'https?://[^\s<>"]+|www\.[^\s<>"]+'
-    return re.findall(url_pattern, text)
-
-def get_domain_from_url(url):
-    """Extracts the domain (e.g., 'example.com') from a URL."""
-    if "://" in url:
-        domain = url.split("://")[1].split("/")[0].split("?")[0]
-    else: # Handles www.example.com cases without http(s)
</code_context>

<issue_to_address>
Domain extraction logic may not handle URLs with subdomains or ports correctly.

Splitting by delimiters may fail for URLs with subdomains, ports, or credentials. Use urllib.parse.urlparse for reliable domain extraction.
</issue_to_address>

### Comment 2
<location> `social_media_analyzer/scam_detector.py:66` </location>
<code_context>
+        return True, f"URL uses a potentially suspicious TLD."
+
+    # 4. Check if a known legitimate service name is part of the domain, but it's not official
+    for service in LEGITIMATE_DOMAINS.keys():
+        if service != "general" and service in domain:
+            return True, f"URL contains the name of a legitimate service ('{service}') but is not an official domain."
+
</code_context>

<issue_to_address>
Service name substring check may produce false positives for legitimate domains.

This logic may incorrectly flag official domains as suspicious. Please update the check to distinguish between legitimate and unofficial uses of service names.
</issue_to_address>

### Comment 3
<location> `social_media_analyzer/scam_detector.py:94` </location>
<code_context>
-        "PAYMENT_REQUEST": PAYMENT_KEYWORDS,
-    }
-
-    for category, keywords in keyword_checks.items():
-        for keyword in keywords:
-            if keyword in text_lower:
-                message = f"Presence of '{category.replace('_', ' ').title()}' keyword: '{keyword}'"
</code_context>

<issue_to_address>
Simple substring matching for keywords may lead to false positives.

Consider using regular expressions with word boundaries to avoid matching keywords within other words.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    # 1. Keyword-based checks
    keyword_checks = {
        "URGENCY": URGENCY_KEYWORDS,
        "SENSITIVE_INFO": SENSITIVE_INFO_KEYWORDS,
        "TOO_GOOD_TO_BE_TRUE": TOO_GOOD_TO_BE_TRUE_KEYWORDS,
        "GENERIC_GREETING": GENERIC_GREETINGS,
        "TECH_SUPPORT": TECH_SUPPORT_SCAM_KEYWORDS,
        "PAYMENT_REQUEST": PAYMENT_KEYWORDS,
    }

    for category, keywords in keyword_checks.items():
        for keyword in keywords:
            if keyword in text_lower:
                message = f"Presence of '{category.replace('_', ' ').title()}' keyword: '{keyword}'"
                if message not in indicators_found:
                    indicators_found.append(message)
                    score += HEURISTIC_WEIGHTS.get(category, 1.0)
=======
    import re

    # 1. Keyword-based checks
    keyword_checks = {
        "URGENCY": URGENCY_KEYWORDS,
        "SENSITIVE_INFO": SENSITIVE_INFO_KEYWORDS,
        "TOO_GOOD_TO_BE_TRUE": TOO_GOOD_TO_BE_TRUE_KEYWORDS,
        "GENERIC_GREETING": GENERIC_GREETINGS,
        "TECH_SUPPORT": TECH_SUPPORT_SCAM_KEYWORDS,
        "PAYMENT_REQUEST": PAYMENT_KEYWORDS,
    }

    for category, keywords in keyword_checks.items():
        for keyword in keywords:
            # Use regex with word boundaries to avoid matching keywords within other words
            pattern = r"\b" + re.escape(keyword) + r"\b"
            if re.search(pattern, text_lower):
                message = f"Presence of '{category.replace('_', ' ').title()}' keyword: '{keyword}'"
                if message not in indicators_found:
                    indicators_found.append(message)
                    score += HEURISTIC_WEIGHTS.get(category, 1.0)
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +29 to +31
def get_domain_from_url(url):
"""Extracts the domain (e.g., 'example.com') from a URL."""
if "://" in url:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Domain extraction logic may not handle URLs with subdomains or ports correctly.

Splitting by delimiters may fail for URLs with subdomains, ports, or credentials. Use urllib.parse.urlparse for reliable domain extraction.

Comment on lines +66 to +67
for service in LEGITIMATE_DOMAINS.keys():
if service != "general" and service in domain:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Service name substring check may produce false positives for legitimate domains.

This logic may incorrectly flag official domains as suspicious. Please update the check to distinguish between legitimate and unofficial uses of service names.

Comment on lines +84 to +100
# 1. Keyword-based checks
keyword_checks = {
"URGENCY": URGENCY_KEYWORDS,
"SENSITIVE_INFO": SENSITIVE_INFO_KEYWORDS,
"TOO_GOOD_TO_BE_TRUE": TOO_GOOD_TO_BE_TRUE_KEYWORDS,
"GENERIC_GREETING": GENERIC_GREETINGS,
"TECH_SUPPORT": TECH_SUPPORT_SCAM_KEYWORDS,
"PAYMENT_REQUEST": PAYMENT_KEYWORDS,
}

for category, keywords in keyword_checks.items():
for keyword in keywords:
if keyword in text_lower:
message = f"Presence of '{category.replace('_', ' ').title()}' keyword: '{keyword}'"
if message not in indicators_found:
indicators_found.append(message)
score += HEURISTIC_WEIGHTS.get(category, 1.0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Simple substring matching for keywords may lead to false positives.

Consider using regular expressions with word boundaries to avoid matching keywords within other words.

Suggested change
# 1. Keyword-based checks
keyword_checks = {
"URGENCY": URGENCY_KEYWORDS,
"SENSITIVE_INFO": SENSITIVE_INFO_KEYWORDS,
"TOO_GOOD_TO_BE_TRUE": TOO_GOOD_TO_BE_TRUE_KEYWORDS,
"GENERIC_GREETING": GENERIC_GREETINGS,
"TECH_SUPPORT": TECH_SUPPORT_SCAM_KEYWORDS,
"PAYMENT_REQUEST": PAYMENT_KEYWORDS,
}
for category, keywords in keyword_checks.items():
for keyword in keywords:
if keyword in text_lower:
message = f"Presence of '{category.replace('_', ' ').title()}' keyword: '{keyword}'"
if message not in indicators_found:
indicators_found.append(message)
score += HEURISTIC_WEIGHTS.get(category, 1.0)
import re
# 1. Keyword-based checks
keyword_checks = {
"URGENCY": URGENCY_KEYWORDS,
"SENSITIVE_INFO": SENSITIVE_INFO_KEYWORDS,
"TOO_GOOD_TO_BE_TRUE": TOO_GOOD_TO_BE_TRUE_KEYWORDS,
"GENERIC_GREETING": GENERIC_GREETINGS,
"TECH_SUPPORT": TECH_SUPPORT_SCAM_KEYWORDS,
"PAYMENT_REQUEST": PAYMENT_KEYWORDS,
}
for category, keywords in keyword_checks.items():
for keyword in keywords:
# Use regex with word boundaries to avoid matching keywords within other words
pattern = r"\b" + re.escape(keyword) + r"\b"
if re.search(pattern, text_lower):
message = f"Presence of '{category.replace('_', ' ').title()}' keyword: '{keyword}'"
if message not in indicators_found:
indicators_found.append(message)
score += HEURISTIC_WEIGHTS.get(category, 1.0)

Comment on lines +50 to +52
if re.search(pattern, normalized_url, re.IGNORECASE):
if not domain.endswith(tuple(legitimate_domains)):
return True, f"URL impersonates a legitimate domain: {pattern}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Merge nested if conditions (merge-nested-ifs)

Suggested change
if re.search(pattern, normalized_url, re.IGNORECASE):
if not domain.endswith(tuple(legitimate_domains)):
return True, f"URL impersonates a legitimate domain: {pattern}"
if re.search(pattern, normalized_url, re.IGNORECASE) and not domain.endswith(tuple(legitimate_domains)):
return True, f"URL impersonates a legitimate domain: {pattern}"


ExplanationToo much nesting can make code difficult to understand, and this is especially
true in Python, where there are no brackets to help out with the delineation of
different nesting levels.

Reading deeply nested code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two if conditions can be combined using
and is an easy win.

Comment on lines +103 to +108
google_url = f"https://images.google.com/searchbyimage?image_url={image_url}"
tineye_url = f"https://tineye.com/search?url={image_url}"
print(f"Attempting to open Google Images: {google_url}")
webbrowser.open(google_url)
print(f"Attempting to open TinEye: {tineye_url}")
webbrowser.open(tineye_url)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Extract code out into function (extract-method)

Comment on lines +34 to +41
profile_url = input(f"Enter the {platform.capitalize()} profile URL to analyze: ").strip()
if profile_url:
fake_profile_detector.analyze_profile_based_on_user_input(profile_url, platform)
else:
print("No profile URL entered.")
break
elif analysis_choice == 2:
message = input("Paste the message you want to analyze: ").strip()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Use named expression to simplify assignment and conditional [×2] (use-named-expression)


def is_url_suspicious(url, platform=None):
"""
Checks if a URL is suspicious based on various patterns and lists.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:

# Example Usage
test_message = "URGENT: Your Instagram account has unusual activity. Please verify your account now by clicking http://instagram.security-update.com/login to avoid suspension."
analysis_result = analyze_text_for_scams(test_message, platform="instagram")
print(f"--- Analyzing Instagram Scam Message ---")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Replace f-string with no interpolated values with string (remove-redundant-fstring)

Suggested change
print(f"--- Analyzing Instagram Scam Message ---")
print("--- Analyzing Instagram Scam Message ---")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants