Automatically categorize Google Alerts and Google Scholar Alerts to determine their relevance to the mineral-exploration-machine-learning repository using AI-powered analysis.
This tool fetches Google Alerts and Google Scholar Alerts from Gmail and uses LLMs (OpenAI GPT-4/GPT-4o-mini, Google Gemini, or OpenRouter) to intelligently categorize them, providing summaries and relevance scores for mineral exploration machine learning applications.
- π Gmail Integration: Automatically fetches Google Alerts and Google Scholar Alerts from your Gmail account
- π Scholar Support: Analyze both general Google Alerts and specialized Google Scholar research alerts
- π€ AI-Powered Categorization: Uses GPT-4o-mini, Gemini, or OpenRouter to analyze content relevance
- π Detailed Reports: Generates markdown or JSON reports with summaries and insights
- π― Smart Filtering: Identifies articles relevant to ML in mineral exploration
- π Keyword Extraction: Extracts key terms from each alert
- β‘ Flexible Configuration: Support for multiple LLM providers and customizable parameters
- Clone the repository:
git clone https://github.com/RichardScottOZ/googlealerts-analysis.git
cd googlealerts-analysis- Install dependencies:
pip install -r requirements.txt-
Set up Google Gmail API:
- Go to Google Cloud Console
- Create a new project or select an existing one
- Enable the Gmail API
- Create OAuth 2.0 credentials (Desktop application)
- Download the credentials as
credentials.jsonand place it in the project root
-
Configure API keys:
cp .env.example .envEdit .env and add your API keys:
- For OpenAI: Get key from OpenAI Platform
- For Gemini: Get key from Google AI Studio
- For OpenRouter: Get key from OpenRouter
Edit the .env file to customize:
# Choose your LLM provider
LLM_PROVIDER=openai # Options: openai, gemini, openrouter
# Choose your model
LLM_MODEL=gpt-4o-mini # OpenAI: gpt-4o-mini, gpt-4, gpt-4o
# Gemini: gemini-1.5-flash, gemini-1.5-pro
# OpenRouter: openai/gpt-4o-mini, anthropic/claude-3.5-sonnet, meta-llama/llama-3.1-70b-instruct
# API Keys
OPENAI_API_KEY=your_key_here
GEMINI_API_KEY=your_key_here
OPENROUTER_API_KEY=your_key_here
# Processing parameters
MAX_EMAILS_TO_PROCESS=10 # Maximum number of alerts to process
DAYS_BACK=7 # How many days back to searchAnalyze Google Alerts:
python analyze_alerts.pyAnalyze Google Scholar Alerts:
python analyze_scholar_alerts.pyGoogle Alerts:
# Use Gemini instead of OpenAI
python analyze_alerts.py --provider gemini
# Use OpenRouter with Claude
python analyze_alerts.py --provider openrouter --model anthropic/claude-3.5-sonnet
# Process last 14 days with specific model
python analyze_alerts.py --days 14 --model gpt-4
# Specify maximum emails and output format
python analyze_alerts.py --max-emails 20 --format json --output results.json
# Full custom configuration
python analyze_alerts.py \
--provider openai \
--model gpt-4o-mini \
--days 7 \
--max-emails 10 \
--output report.md \
--format markdownGoogle Scholar Alerts:
# Same options as Google Alerts
python analyze_scholar_alerts.py --provider gemini --days 14
# Custom output file
python analyze_scholar_alerts.py --output my_scholar_report.md
# JSON format
python analyze_scholar_alerts.py --format json --output scholar_results.jsonBoth analyze_alerts.py and analyze_scholar_alerts.py support:
--provider: LLM provider (openai,gemini, oropenrouter)--model: Specific model name--days: Number of days back to search for alerts--max-emails: Maximum number of alerts to process--output: Output file path (default:report.mdorscholar_report.md)--format: Output format (markdownorjson)
On first run, you'll be prompted to authenticate with Google:
- A browser window will open
- Select your Google account
- Grant permission to read Gmail
- A token will be saved for future runs
The tools generate two reports for each type of alert:
Markdown Report (report.md):
Human-readable report with:
- Summary statistics
- Relevant alerts with detailed analysis
- Article links and summaries
- Categorization reasoning
- Keywords and confidence scores
JSON Report (report.json):
Always generated automatically for programmatic processing. Contains full structured data.
Markdown Report (scholar_report.md):
Same format as Google Alerts but for Scholar research articles.
JSON Report (scholar_report.json):
Machine-readable format for Scholar alerts.
Example output:
# Google Scholar Alerts Analysis Report
## Summary
- **Total Alerts Processed:** 10
- **Relevant to mineral-exploration-machine-learning:** 7
- **Relevance Rate:** 70.0%
## Relevant Alerts
### 1. Machine Learning in Copper Exploration
**Category:** Machine Learning - Exploration
**Confidence:** 0.92
**Summary:** New ML algorithms improve copper deposit prediction accuracy...
**Keywords:** machine learning, copper, exploration, predictive modelingAfter running the analysis scripts, use the list_articles.py helper script to view all articles in chronological order:
# List relevant articles from both Google Alerts and Scholar Alerts
python list_articles.py
# Include non-relevant articles too
python list_articles.py --show-all
# Save to markdown file
python list_articles.py --format markdown --output articles.md
# Save to JSON file
python list_articles.py --format json --output articles.json
# Generate separate outputs for each source
python list_articles.py --separate --output articles.txt
# Only show Google Alerts (not Scholar)
python list_articles.py --google-alerts-only
# Only show Scholar Alerts
python list_articles.py --scholar-alerts-only
# Use custom input files
python list_articles.py --google-alerts my_report.json --scholar-alerts my_scholar.jsonThe script reads the default JSON outputs (report.json and scholar_report.json) and presents articles sorted by date (newest first) with:
- Article title and URL
- Summary
- Date and source (Google Alert or Scholar Alert)
- Relevance status and reasoning
- Original alert query
googlealerts-analysis/
βββ analyze_alerts.py # Main orchestrator for Google Alerts
βββ analyze_scholar_alerts.py # Main orchestrator for Google Scholar Alerts
βββ list_articles.py # Helper script to list articles chronologically
βββ gmail_fetcher.py # Gmail API integration (supports both alert types)
βββ llm_categorizer.py # LLM categorization logic
βββ requirements.txt # Python dependencies
βββ .env.example # Configuration template
βββ .gitignore # Git ignore rules
βββ README.md # This file
- Fetch: Connects to Gmail API and retrieves Google Alerts or Google Scholar Alerts emails
- Parse: Extracts article titles, URLs, and content from emails
- Analyze: Sends each alert to LLM with context about the mineral-exploration-machine-learning repo
- Categorize: LLM determines relevance, confidence, and provides reasoning
- Report: Generates formatted report with all findings
The LLM evaluates alerts based on relevance to:
- Machine learning in mineral exploration
- Geoscience data analysis with ML/AI
- Remote sensing and geophysical data processing
- Predictive modeling for mineral deposits
- Geological mapping with ML
- Exploration targeting using data science
- Mining industry AI applications
- Ensure
credentials.jsonis in the project root - Delete
token.pickleand re-authenticate if issues persist
- Verify API keys in
.envfile - Check API key has sufficient quota/credits
- For Google Alerts: Verify you have Google Alerts set up in Gmail
- For Google Scholar Alerts: Verify you have Google Scholar Alerts set up
- Check the
DAYS_BACKparameter - increase if needed - Confirm Google Alerts are from
googlealerts-noreply@google.com - Confirm Scholar Alerts are from
scholaralerts-noreply@google.com
- This issue has been fixed! The system now correctly extracts article URLs from Scholar redirect links
- Scholar alerts use
scholar.google.com/scholar_url?url=<actual_url>format - The URL extraction now handles both regular Google Alerts and Scholar-specific formats
- OpenAI GPT-4o-mini: Very cost-effective (~$0.15 per 1M input tokens)
- Google Gemini Flash: Also cost-effective with generous free tier
- Each alert typically uses 500-1000 tokens
For 10 alerts/day:
- Daily cost: < $0.01 with GPT-4o-mini or Gemini Flash
- Monthly cost: < $0.30
Contributions welcome! Areas for improvement:
- Better email parsing for different alert formats
- Additional categorization criteria
- Integration with GitHub Issues/Projects
- Web UI for results viewing
- Scheduled automated runs
MIT License - feel free to use and modify for your needs.
- mineral-exploration-machine-learning - The repository this tool helps curate content for