Skip to content

halans/ai-pattern-detection

Repository files navigation

Slop Detector

A simple, pattern-based AI detection system that analyzes text for AI-generated content signals.

Overview

This tool uses regex-based pattern matching to detect characteristic patterns in AI-generated text, without the complexity of ML models or GPU infrastructure. It's fast, transparent, and privacy-focused with zero data retention.

Features

  • 46 Detection Patterns: Significance statements, AI meta-text, collaborative phrases, cultural clichés, AI-favored vocabulary, business jargon, data analysis phrases, and more
  • Fast Analysis: <50ms CPU time per request
  • Privacy-First: Zero data retention, ephemeral processing
  • Transparent: See exactly which patterns were detected
  • No ML Required: Simple pattern matching, no transformers or GPUs
  • Serverless: Cloudflare Workers backend with automatic scaling
  • Chrome Extension: Analyze text directly from any webpage via the Slop Detector side panel
  • File Uploads: Analyze .txt, .md, or .html documents alongside pasted text

Architecture

ai-detection/
├── backend/          # Cloudflare Workers API (TypeScript)
│   ├── src/
│   │   ├── patterns/     # Pattern registry and analyzer
│   │   ├── preprocessing/# Text normalization
│   │   ├── reporting/    # Report generation
│   │   └── index.ts      # Hono API
│   └── package.json
│
├── frontend/         # React + Vite + TailwindCSS
│   ├── src/
│   │   ├── components/   # TextInput, Results
│   │   ├── utils/        # API client
│   │   └── App.tsx
│   └── package.json
│
└── openspec/         # OpenSpec proposal and specs

Quick Start

Backend

cd backend
npm install
npm run dev -- --port 8787

Backend runs on http://localhost:8787

Frontend

cd frontend
npm install
cp .env.example .env
npm run dev

Frontend runs on http://localhost:3000

Pattern Detection

The system detects 46 AI writing patterns grouped by severity:

CRITICAL (15 points each)

  • AI self-references - "as an AI language model", "as an AI assistant"
  • Knowledge cutoff disclaimers - "as of my last update", "as at my latest training"

HIGH (8 points each)

  • Collaborative phrases - "let me know if", "I hope this helps", "would you like"
  • Significance statements - "stands as a testament", "serves as a symbol"
  • Editorializing - "it's important to note", "it is important to remember"
  • Placeholder templates - "[insert example here]", "[placeholder text]"
  • Helpful closings - "Certainly!", "Of course!", "here's a summary"
  • Data analysis jargon - "deliver actionable insights through in-depth data analysis", "leveraging data-driven insights", "drive insightful data-driven decisions"
  • Business/tech jargon - "bandwidth", "stakeholders", "value proposition", "scalable", "paradigm shift", "synergy", "ROI"

MEDIUM (4 points each)

  • Cultural clichés - "rich cultural heritage", "profound legacy", "rich historical tapestry"
  • Negative parallelisms - "not only...but also", "not just...rather"
  • Vague attributions - "studies show", "research suggests", "experts indicate"
  • Challenges/prospects - "despite these challenges", "despite its challenges"
  • Worth mentioning - "worth mentioning that", "it is worth mentioning"
  • Stock phrases - "a testament to", "it's important to note that", "this is not an exhaustive list"
  • Communication styles - "furthermore", "on the other hand", "as previously mentioned"
  • Action words - "unlock the secrets", "delve into", "harness", "revolutionize", "elevate", "envision", "transcend", "galvanize"
  • Contextual phrases - "in the world of", "in today's digital age/era", "when it comes to", "folks"
  • Conductor/music analogies - "like a conductor", "orchestrate", "symphony"
  • Hyperbolic phrases - "break barriers", "cannot be overstated", "unwavering"
  • Connectives - "conversely", "along with", "amidst", "towards"
  • Empowerment verbs - "empower", "embrace", "grasp", "hinder"
  • Deep + noun patterns - "deep understanding", "deep insights", "deep dive"
  • Hustle and bustle - Urban energy cliché phrase
  • Quantity phrases - "a plethora of", "a multitude of", "a journey of"
  • Significance intensifiers - "paramount", "pivotal", "undeniable", "demonstrates significant"
  • Broken citations - "[citation needed]", "[source]"
  • Emoji headings - "# 🎯 Getting Started", "## 🚀 Features"

LOW (2 points each)

  • Ritual conclusions - "in summary", "overall", "in conclusion"
  • Artificial ranges - "from beginners to experts", "from design to deployment"
  • Title case headings - "# The Complete Guide To Modern Development"
  • Em-dash spam - Excessive use of em-dashes (—) in text

VERY_LOW (1 point each)

  • AI-favored adjectives - "robust", "seamless", "innovative", "holistic", "nuanced", "multifaceted", "groundbreaking", "quintessential", "visionary", "revolutionary", "paradigm-shifting"
  • AI-favored nouns - "landscape", "realm", "expertise", "paradigm", "kaleidoscope", "epitome", "odyssey", "pinnacle", "nexus", "spectrum"
  • AI-favored verbs - "facilitate", "underscore", "augment", "align", "maximize", "utilize"
  • AI descriptors - "meticulous", "ever-evolving", "cutting-edge", "labyrinthine", "gossamer", "key", "valuable", "fresh perspectives"
  • Repetition patterns - Repeated words, bigrams, or trigrams (3+ occurrences)

INFORMATIONAL (0.2 points each)

  • AI transitional words - "accordingly", "moreover", "nevertheless", "nonetheless", "thus", "undoubtedly", "certainly", "equally", "hence"

Scoring

Total Score = Σ (pattern_weight × match_count)
Normalized Score = min(100, Total Score)

Classification Thresholds:
- 0-34:   Likely Human
- 35-64:  Mixed/Uncertain
- 65-100: Likely AI Slop

API

POST /api/analyze

Request:

{
  "text": "Your text to analyze here..."
}

Response:

{
  "classification": "Likely AI Slop",
  "confidence_score": 75,
  "patterns_detected": [...],
  "explanation": "...",
  "metadata": {
    "character_count": 1500,
    "word_count": 250,
    "pattern_engine_version": "1.2.0",
    "analysis_duration": 45,
    "timestamp": "2025-10-13T...",
    "warnings": [],
    "submission_source": "text"
  }
}

POST /api/analyze/file

Accepts multipart/form-data payloads containing a single file field. Supported extensions: .txt, .md, .html.

Response (excerpt):

{
  "classification": "Mixed/Uncertain",
  "confidence_score": 42,
  "metadata": {
    "character_count": 4800,
    "word_count": 820,
    "pattern_engine_version": "1.4.0",
    "analysis_duration": 58,
    "timestamp": "2025-10-17T...",
    "warnings": [],
    "submission_source": "file",
    "file_metadata": {
      "name": "notes.md",
      "type": "md",
      "character_count": 4700
    }
  }
}

Privacy

  • Zero Data Retention: No text is stored
  • Ephemeral Processing: All processing in memory
  • No Logging: Text content never logged
  • GDPR/CCPA Compliant: No personal data collected
  • Privacy Policy: slopdetector.me/privacy

Performance

  • Target: <50ms CPU time per request
  • Typical: 20-40ms for 1000-word text
  • Max Input: 20,000 characters

Deployment

Backend (Cloudflare Workers)

cd backend
npm run deploy

Frontend (Cloudflare Pages)

cd frontend
npm run build
# Deploy dist/ to Cloudflare Pages

Development

Backend Structure

  • patterns/registry.ts - Pattern definitions
  • patterns/analyzer.ts - Pattern matching engine
  • preprocessing/normalizer.ts - Text normalization
  • reporting/generator.ts - Report generation
  • index.ts - Hono API routes

Frontend Structure

  • components/TextInput.tsx - Text input with validation
  • components/Results.tsx - Results visualization
  • utils/api.ts - API client
  • App.tsx - Main application

Testing

Backend Tests

The backend includes comprehensive test coverage using Vitest:

cd backend
npm test                  # Run all tests
npm run test:coverage     # Run tests with coverage report

Test Suites:

  • Pattern Registry Tests (registry.test.ts) - 22 tests

    • Pattern structure validation
    • Pattern matching accuracy
    • Helper function correctness
    • Version validation
  • Analyzer Tests (analyzer.test.ts) - 18 tests

    • Text analysis accuracy
    • Score calculation
    • Classification thresholds
    • Performance benchmarks
  • API Endpoint Tests (index.test.ts) - 15 tests

    • Request/response validation
    • Error handling
    • Input validation
    • CORS configuration
    • Performance testing

Total: 55+ backend tests

Frontend Tests

Frontend testing can be added using Vitest + React Testing Library:

cd frontend
npm test

Running Specific Tests

# Run only pattern registry tests
npm test -- --run registry.test.ts

# Run only analyzer tests
npm test -- --run analyzer.test.ts

# Run only API tests
npm test -- --run index.test.ts

# Run with watch mode (development)
npm test

Test Configuration

The tests use Vitest with a custom configuration (vitest.config.ts) that:

  • Uses single fork mode to prevent memory issues
  • Sets appropriate timeouts for pattern matching tests
  • Enables Node environment for backend testing

Known Issues & Workarounds

Memory Issues: Due to the comprehensive regex pattern matching (46 patterns), tests may encounter memory limits when run all together.

Recommended approach:

# Run test files individually
npm test -- --run registry.test.ts
npm test -- --run analyzer.test.ts
npm test -- --run index.test.ts

# Or increase Node memory limit
NODE_OPTIONS="--max-old-space-size=4096" npm test -- --run

Why this happens: The pattern analyzer processes text against 46 complex regex patterns. While individual file tests work fine, running all tests concurrently can exceed default memory limits.

Production impact: None - the API runs efficiently in Cloudflare Workers with proper memory management. This only affects comprehensive test execution.

OpenSpec

This project follows the OpenSpec proposal workflow. See openspec/changes/add-ai-detection-tool/ for:

  • proposal.md - Project rationale and requirements
  • design.md - Technical decisions and architecture
  • tasks.md - Implementation checklist
  • specs/ - Detailed specifications for each capability

License

MIT

Contributing

Pattern contributions welcome! To add new patterns:

  1. Add pattern to backend/src/patterns/registry.ts
  2. Assign appropriate severity and weight
  3. Test against AI and human samples
  4. Submit PR with examples

Authors

JJ Halans

Version

1.7.0 - Pattern Engine (46 patterns - Adds undue notability coverage detection with expanded documentation/tests)

Chrome Extension

The repository includes a Chrome side panel extension under browser-extension/.

Build & Load

cd browser-extension
npm install
# Optional: override default API base URL (defaults to https://api.slopdetector.me)
export VITE_EXTENSION_API_URL="https://your-worker-domain"  # base URL only
npm run build

Load the generated browser-extension/dist folder via chrome://extensions (Developer Mode → Load unpacked).

API Usage

  • The extension sends a POST ${BASE_URL}/api/analyze request with JSON { "text": "..." }.
  • Ensure your backend exposes the /api/analyze route and accepts POST requests with raw text.
  • If you override VITE_EXTENSION_API_URL, provide the base origin only (e.g., https://your-worker-domain) and the code will append /api/analyze when making requests.
  • Failed responses surface in the panel as “Analysis failed” with the backend error message if available.

About

AI pattern detection web app build with OpenSpec and Claude Code/Codex.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages