Refactor and expand social media analyzer #8

GYFX35 · 2025-09-06T11:03:21Z

This commit refactors the existing Facebook and scam analyzers into a single, generic social media analyzer.

The new social_media_analyzer supports the following platforms:

Facebook
Instagram
WhatsApp
TikTok
Tinder
Snapchat
WeChat

The fake profile detector and scam message analyzer have been generalized to be platform-aware. The user is now prompted to select a platform before performing an analysis.

The old facebook_analyzer and scam_detector directories have been removed.

Summary by Sourcery

Refactor existing Facebook and scam analyzers into a single generic social media analyzer supporting seven platforms, generalizing fake profile and scam message detection under one CLI and cleaning up deprecated code

New Features:

Introduce a unified social_media_analyzer CLI to select platform and analysis type
Add fake_profile_detector with interactive checklist, platform-specific advice, and weighted indicators
Implement scam_detector with heuristic-based text and URL analysis, including platform-aware domain whitelisting

Enhancements:

Consolidate domain lists, keyword patterns, and scoring weights into a shared heuristics module

Chores:

Remove legacy facebook_analyzer and scam_detector directories and files

This commit refactors the existing Facebook and scam analyzers into a single, generic social media analyzer. The new `social_media_analyzer` supports the following platforms: - Facebook - Instagram - WhatsApp - TikTok - Tinder - Snapchat - WeChat The fake profile detector and scam message analyzer have been generalized to be platform-aware. The user is now prompted to select a platform before performing an analysis. The old `facebook_analyzer` and `scam_detector` directories have been removed.

sourcery-ai · 2025-09-06T11:03:26Z

Reviewer's Guide

Refactors and unifies the Facebook and scam analyzers into a single social_media_analyzer package that supports seven platforms by consolidating shared logic, introducing platform-aware heuristics, and providing a unified CLI interface.

Entity relationship diagram for platform-specific advice and legitimate domains

erDiagram
    PLATFORM_SPECIFIC_ADVICE {
        string platform
        list advice
    }
    LEGITIMATE_DOMAINS {
        string platform
        list domains
    }
    PLATFORM_SPECIFIC_ADVICE ||--|{ LEGITIMATE_DOMAINS : "platform"

Class diagram for the new social_media_analyzer package

classDiagram
    class fake_profile_detector {
        +analyze_profile_based_on_user_input(profile_url, platform)
        +guide_reverse_image_search(image_url=None)
        +print_platform_specific_advice(platform)
        PLATFORM_SPECIFIC_ADVICE : dict
        FAKE_PROFILE_INDICATORS : list
    }
    class scam_detector {
        +analyze_text_for_scams(text_content, platform=None)
        +is_url_suspicious(url, platform=None)
        +get_domain_from_url(url)
        +get_legitimate_domains(platform=None)
    }
    class heuristics {
        LEGITIMATE_DOMAINS : dict
        URGENCY_KEYWORDS : list
        SENSITIVE_INFO_KEYWORDS : list
        TOO_GOOD_TO_BE_TRUE_KEYWORDS : list
        GENERIC_GREETINGS : list
        TECH_SUPPORT_SCAM_KEYWORDS : list
        PAYMENT_KEYWORDS : list
        URL_PATTERN : regex
        SUSPICIOUS_TLDS : list
        CRYPTO_ADDRESS_PATTERNS : dict
        PHONE_NUMBER_PATTERN : regex
        SUSPICIOUS_URL_PATTERNS : list
        HEURISTIC_WEIGHTS : dict
    }
    class main {
        +main()
    }
    fake_profile_detector --|> heuristics : uses
    scam_detector --|> heuristics : uses
    main --> fake_profile_detector : imports
    main --> scam_detector : imports

File-Level Changes

Change	Details	Files
Combine Facebook and scam analyzers into a single generic social_media_analyzer package	Added a main CLI (main.py) for platform selection and analysis type Reorganized analyzer code under social_media_analyzer/ directory Removed legacy facebook_analyzer and scam_detector directories and entry points	`social_media_analyzer/main.py` `facebook_analyzer/__init__.py` `facebook_analyzer/fake_profile_detector.py` `facebook_analyzer/phishing_detector.py` `scam_main.py` `scam_detector/__init__.py` `scam_detector/analyzer.py` `scam_detector/heuristics.py`
Generalize fake profile detection to be platform-aware	Created a unified fake_profile_detector with PLATFORM_SPECIFIC_ADVICE mapping for each platform Consolidated generic FAKE_PROFILE_INDICATORS list for manual checklist Implemented guided reverse image search and interactive prompt in analyze_profile_based_on_user_input	`social_media_analyzer/fake_profile_detector.py`
Generalize scam message analyzer with platform-aware URL and keyword heuristics	Added get_legitimate_domains() and extended is_url_suspicious() to consider platform-specific domains Enhanced analyze_text_for_scams() to use shared heuristics and report URL analysis details Imported centralized heuristics module for patterns, keywords, and weights	`social_media_analyzer/scam_detector.py`
Extract and expand heuristics into a dedicated module	Defined LEGITIMATE_DOMAINS per social media platform Centralized all keyword lists, regex patterns, and suspicious TLD definitions Configured HEURISTIC_WEIGHTS for scoring various scam indicators	`social_media_analyzer/heuristics.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `social_media_analyzer/scam_detector.py:29` </location>
<code_context>
-    url_pattern = r'https?://[^\s<>"]+|www\.[^\s<>"]+'
-    return re.findall(url_pattern, text)
-
-def get_domain_from_url(url):
-    """Extracts the domain (e.g., 'example.com') from a URL."""
-    if "://" in url:
-        domain = url.split("://")[1].split("/")[0].split("?")[0]
-    else: # Handles www.example.com cases without http(s)
</code_context>

<issue_to_address>
Domain extraction logic may not handle URLs with subdomains or ports correctly.

Splitting by delimiters may fail for URLs with subdomains, ports, or credentials. Use urllib.parse.urlparse for reliable domain extraction.
</issue_to_address>

### Comment 2
<location> `social_media_analyzer/scam_detector.py:66` </location>
<code_context>
+        return True, f"URL uses a potentially suspicious TLD."
+
+    # 4. Check if a known legitimate service name is part of the domain, but it's not official
+    for service in LEGITIMATE_DOMAINS.keys():
+        if service != "general" and service in domain:
+            return True, f"URL contains the name of a legitimate service ('{service}') but is not an official domain."
+
</code_context>

<issue_to_address>
Service name substring check may produce false positives for legitimate domains.

This logic may incorrectly flag official domains as suspicious. Please update the check to distinguish between legitimate and unofficial uses of service names.
</issue_to_address>

### Comment 3
<location> `social_media_analyzer/scam_detector.py:94` </location>
<code_context>
-        "PAYMENT_REQUEST": PAYMENT_KEYWORDS,
-    }
-
-    for category, keywords in keyword_checks.items():
-        for keyword in keywords:
-            if keyword in text_lower:
-                message = f"Presence of '{category.replace('_', ' ').title()}' keyword: '{keyword}'"
</code_context>

<issue_to_address>
Simple substring matching for keywords may lead to false positives.

Consider using regular expressions with word boundaries to avoid matching keywords within other words.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    # 1. Keyword-based checks
    keyword_checks = {
        "URGENCY": URGENCY_KEYWORDS,
        "SENSITIVE_INFO": SENSITIVE_INFO_KEYWORDS,
        "TOO_GOOD_TO_BE_TRUE": TOO_GOOD_TO_BE_TRUE_KEYWORDS,
        "GENERIC_GREETING": GENERIC_GREETINGS,
        "TECH_SUPPORT": TECH_SUPPORT_SCAM_KEYWORDS,
        "PAYMENT_REQUEST": PAYMENT_KEYWORDS,
    }

    for category, keywords in keyword_checks.items():
        for keyword in keywords:
            if keyword in text_lower:
                message = f"Presence of '{category.replace('_', ' ').title()}' keyword: '{keyword}'"
                if message not in indicators_found:
                    indicators_found.append(message)
                    score += HEURISTIC_WEIGHTS.get(category, 1.0)
=======
    import re

    # 1. Keyword-based checks
    keyword_checks = {
        "URGENCY": URGENCY_KEYWORDS,
        "SENSITIVE_INFO": SENSITIVE_INFO_KEYWORDS,
        "TOO_GOOD_TO_BE_TRUE": TOO_GOOD_TO_BE_TRUE_KEYWORDS,
        "GENERIC_GREETING": GENERIC_GREETINGS,
        "TECH_SUPPORT": TECH_SUPPORT_SCAM_KEYWORDS,
        "PAYMENT_REQUEST": PAYMENT_KEYWORDS,
    }

    for category, keywords in keyword_checks.items():
        for keyword in keywords:
            # Use regex with word boundaries to avoid matching keywords within other words
            pattern = r"\b" + re.escape(keyword) + r"\b"
            if re.search(pattern, text_lower):
                message = f"Presence of '{category.replace('_', ' ').title()}' keyword: '{keyword}'"
                if message not in indicators_found:
                    indicators_found.append(message)
                    score += HEURISTIC_WEIGHTS.get(category, 1.0)
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-09-06T11:04:12Z

social_media_analyzer/scam_detector.py

+def get_domain_from_url(url):
+    """Extracts the domain (e.g., 'example.com') from a URL."""
+    if "://" in url:


issue: Domain extraction logic may not handle URLs with subdomains or ports correctly.

Splitting by delimiters may fail for URLs with subdomains, ports, or credentials. Use urllib.parse.urlparse for reliable domain extraction.

sourcery-ai · 2025-09-06T11:04:12Z

social_media_analyzer/scam_detector.py

+    for service in LEGITIMATE_DOMAINS.keys():
+        if service != "general" and service in domain:


issue (bug_risk): Service name substring check may produce false positives for legitimate domains.

This logic may incorrectly flag official domains as suspicious. Please update the check to distinguish between legitimate and unofficial uses of service names.

sourcery-ai · 2025-09-06T11:04:12Z

social_media_analyzer/scam_detector.py

+    # 1. Keyword-based checks
+    keyword_checks = {
+        "URGENCY": URGENCY_KEYWORDS,
+        "SENSITIVE_INFO": SENSITIVE_INFO_KEYWORDS,
+        "TOO_GOOD_TO_BE_TRUE": TOO_GOOD_TO_BE_TRUE_KEYWORDS,
+        "GENERIC_GREETING": GENERIC_GREETINGS,
+        "TECH_SUPPORT": TECH_SUPPORT_SCAM_KEYWORDS,
+        "PAYMENT_REQUEST": PAYMENT_KEYWORDS,
+    }
+
+    for category, keywords in keyword_checks.items():
+        for keyword in keywords:
+            if keyword in text_lower:
+                message = f"Presence of '{category.replace('_', ' ').title()}' keyword: '{keyword}'"
+                if message not in indicators_found:
+                    indicators_found.append(message)
+                    score += HEURISTIC_WEIGHTS.get(category, 1.0)


suggestion: Simple substring matching for keywords may lead to false positives.

Consider using regular expressions with word boundaries to avoid matching keywords within other words.

Suggested change

# 1. Keyword-based checks

keyword_checks = {

"URGENCY": URGENCY_KEYWORDS,

"SENSITIVE_INFO": SENSITIVE_INFO_KEYWORDS,

"TOO_GOOD_TO_BE_TRUE": TOO_GOOD_TO_BE_TRUE_KEYWORDS,

"GENERIC_GREETING": GENERIC_GREETINGS,

"TECH_SUPPORT": TECH_SUPPORT_SCAM_KEYWORDS,

"PAYMENT_REQUEST": PAYMENT_KEYWORDS,

}

for category, keywords in keyword_checks.items():

for keyword in keywords:

if keyword in text_lower:

message = f"Presence of '{category.replace('_', ' ').title()}' keyword: '{keyword}'"

if message not in indicators_found:

indicators_found.append(message)

score += HEURISTIC_WEIGHTS.get(category, 1.0)

import re

# 1. Keyword-based checks

keyword_checks = {

"URGENCY": URGENCY_KEYWORDS,

"SENSITIVE_INFO": SENSITIVE_INFO_KEYWORDS,

"TOO_GOOD_TO_BE_TRUE": TOO_GOOD_TO_BE_TRUE_KEYWORDS,

"GENERIC_GREETING": GENERIC_GREETINGS,

"TECH_SUPPORT": TECH_SUPPORT_SCAM_KEYWORDS,

"PAYMENT_REQUEST": PAYMENT_KEYWORDS,

}

for category, keywords in keyword_checks.items():

for keyword in keywords:

# Use regex with word boundaries to avoid matching keywords within other words

pattern = r"\b" + re.escape(keyword) + r"\b"

if re.search(pattern, text_lower):

message = f"Presence of '{category.replace('_', ' ').title()}' keyword: '{keyword}'"

if message not in indicators_found:

indicators_found.append(message)

score += HEURISTIC_WEIGHTS.get(category, 1.0)

sourcery-ai · 2025-09-06T11:04:12Z

social_media_analyzer/scam_detector.py

+            if re.search(pattern, normalized_url, re.IGNORECASE):
+                if not domain.endswith(tuple(legitimate_domains)):
+                    return True, f"URL impersonates a legitimate domain: {pattern}"


suggestion (code-quality): Merge nested if conditions (merge-nested-ifs)

Suggested change

if re.search(pattern, normalized_url, re.IGNORECASE):

if not domain.endswith(tuple(legitimate_domains)):

return True, f"URL impersonates a legitimate domain: {pattern}"

if re.search(pattern, normalized_url, re.IGNORECASE) and not domain.endswith(tuple(legitimate_domains)):

return True, f"URL impersonates a legitimate domain: {pattern}"

Explanation
Too much nesting can make code difficult to understand, and this is especially
true in Python, where there are no brackets to help out with the delineation of
different nesting levels.

Reading deeply nested code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two if conditions can be combined using
and is an easy win.

sourcery-ai · 2025-09-06T11:04:12Z

social_media_analyzer/fake_profile_detector.py

+        google_url = f"https://images.google.com/searchbyimage?image_url={image_url}"
+        tineye_url = f"https://tineye.com/search?url={image_url}"
+        print(f"Attempting to open Google Images: {google_url}")
+        webbrowser.open(google_url)
+        print(f"Attempting to open TinEye: {tineye_url}")
+        webbrowser.open(tineye_url)


issue (code-quality): Extract code out into function (extract-method)

sourcery-ai · 2025-09-06T11:04:12Z

social_media_analyzer/main.py

+                profile_url = input(f"Enter the {platform.capitalize()} profile URL to analyze: ").strip()
+                if profile_url:
+                    fake_profile_detector.analyze_profile_based_on_user_input(profile_url, platform)
+                else:
+                    print("No profile URL entered.")
+                break
+            elif analysis_choice == 2:
+                message = input("Paste the message you want to analyze: ").strip()


issue (code-quality): Use named expression to simplify assignment and conditional [×2] (use-named-expression)

sourcery-ai · 2025-09-06T11:04:13Z

social_media_analyzer/scam_detector.py

+
+def is_url_suspicious(url, platform=None):
+    """
+    Checks if a URL is suspicious based on various patterns and lists.


issue (code-quality): We've found these issues:

Use the built-in function next instead of a for-loop (use-next)

Replace f-string with no interpolated values with string (remove-redundant-fstring)

sourcery-ai · 2025-09-06T11:04:13Z

social_media_analyzer/scam_detector.py

+    # Example Usage
+    test_message = "URGENT: Your Instagram account has unusual activity. Please verify your account now by clicking http://instagram.security-update.com/login to avoid suspension."
+    analysis_result = analyze_text_for_scams(test_message, platform="instagram")
+    print(f"--- Analyzing Instagram Scam Message ---")


suggestion (code-quality): Replace f-string with no interpolated values with string (remove-redundant-fstring)

Suggested change

print(f"--- Analyzing Instagram Scam Message ---")

print("--- Analyzing Instagram Scam Message ---")

GYFX35 merged commit 0b2b5d7 into main Sep 6, 2025
2 of 6 checks passed

sourcery-ai bot reviewed Sep 6, 2025

View reviewed changes

		for service in LEGITIMATE_DOMAINS.keys():
		if service != "general" and service in domain:

	print(f"--- Analyzing Instagram Scam Message ---")
	print("--- Analyzing Instagram Scam Message ---")

Uh oh!

Refactor and expand social media analyzer #8

Refactor and expand social media analyzer #8

Uh oh!

Conversation

GYFX35 commented Sep 6, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Entity relationship diagram for platform-specific advice and legitimate domains

Class diagram for the new social_media_analyzer package

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GYFX35 commented Sep 6, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Sep 6, 2025 •

edited

Loading