Skip to content

Add Wikipedia Scout pipeline scaffolding#2

Open
artvandelay wants to merge 1 commit intomainfrom
codex/build-wikipedia-political-edit-spike-scout
Open

Add Wikipedia Scout pipeline scaffolding#2
artvandelay wants to merge 1 commit intomainfrom
codex/build-wikipedia-political-edit-spike-scout

Conversation

@artvandelay
Copy link
Owner

Motivation

  • Provide a lightweight scaffold to ingest Wikimedia EventStreams recentchange events and persist them for downstream analysis.
  • Enable extraction and analysis of edit diffs by adding diff parsing, term extraction, and spike detection utilities to surface anomalous term activity.
  • Expose a simple reporting path to render term spike digests for review and alerting workflows.

Description

  • Add a new wiki_scout package with modules config.py, ingest.py, storage.py, diffs.py, terms.py, spikes.py, report.py, and __init__.py that wire the pieces together.
  • Implement EventStreamClient and consume_recentchange to stream events and write allowed events to SQLite using connect, init_db, and insert_recentchange.
  • Implement diff utilities fetch_diff_html and parse_diff_text, term extraction helpers tokenize and extract_terms, and spike utilities robust_z_score, ratio_spike, and window_range.
  • Add reporting helpers build_digest and render_markdown that read spike_events from the database and produce a markdown digest.

Testing

  • No automated tests were run.

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant