MyCareersFuture job crawler and matcher for Singapore - local personal use.
- Job Scraping: Incremental crawling of MyCareersFuture job listings
- Resume Matching: Match your resume against scraped jobs using semantic similarity
- Interaction Tracking: Track which jobs you've viewed, applied to, or dismissed
- Local Database: DuckDB for local storage (no cloud required)
- Web Dashboard: Simple localhost UI for viewing matches and managing interactions
# Install Python dependencies
uv sync
# Install frontend dependencies
cd frontend
npm install
cd ..Create a resume/ folder and place your resume file there:
mkdir resume
# Place your resume as: resume/resume.pdf (or .docx, .txt, .md)Supported formats: .pdf, .docx, .txt, .md
# Process resume and create profile
uv run mcf process-resumeThis will:
- Extract text from your resume
- Create a profile
- Generate an embedding for matching
# Crawl new jobs (run this daily)
uv run mcf crawl-incrementalThis will:
- Fetch new jobs from MyCareersFuture
- Generate embeddings for job descriptions
- Store basic info + URLs (descriptions not stored to save space)
Via CLI:
# Find matching jobs
uv run mcf match-jobsVia Web Dashboard:
# Start API server (terminal 1)
uv run uvicorn mcf.api.server:app --reload --port 8000
# Start frontend (terminal 2)
cd frontend
npm run devOpen http://localhost:3000 and click "Find Matches"
Process resume:
mcf process-resume
# Or specify custom path:
mcf process-resume --resume path/to/resume.pdfCrawl jobs:
# Default: uses data/mcf.duckdb
mcf crawl-incremental
# Custom database path:
mcf crawl-incremental --db path/to/database.duckdb
# Limit for testing:
mcf crawl-incremental --limit 100Find job matches:
# Find top 25 matches (excludes interacted jobs)
mcf match-jobs
# Include interacted jobs:
mcf match-jobs --include-interacted
# Get more matches:
mcf match-jobs --top-k 50Mark job interaction:
mcf mark-interaction <job-uuid> --type viewed
mcf mark-interaction <job-uuid> --type applied
mcf mark-interaction <job-uuid> --type dismissed
mcf mark-interaction <job-uuid> --type savedFull crawl to parquet (for one-time exports):
mcf crawl --output data/jobsGET /api/profile- Get profile and resume statusPOST /api/profile/process-resume- Process resume from fileGET /api/matches- Get job matches for your resumeGET /api/jobs- List jobs (excludes interacted by default)GET /api/jobs/{job_uuid}- Get job basic infoPOST /api/jobs/{job_uuid}/interact- Mark job as interactedGET /api/health- Health check
- Morning: Run
mcf crawl-incrementalto fetch new jobs - Afternoon: Open dashboard at http://localhost:3000
- Click "Find Matches": See new jobs matching your resume
- Interact with jobs: Click "Viewed", "Applied", "Dismissed", or "Save"
- Next day: Only new/unviewed jobs will appear (interacted jobs are filtered out)
- Backend: FastAPI (Python)
- Frontend: Next.js 14 (React, TypeScript)
- Database: DuckDB (local file-based, no server needed)
- Storage: Only stores embeddings + basic info + URLs (no full descriptions)
Default paths (can be overridden via environment variables):
- Database:
data/mcf.duckdb - Resume:
resume/resume.pdf - User ID:
default_user
Set via environment variables:
export DB_PATH=data/mcf.duckdb
export RESUME_PATH=resume/resume.pdf
export DEFAULT_USER_ID=default_user
export API_PORT=8000To add a new production dependency:
uv add requestsTo add a new development dependency:
uv add --dev ipdbAfter adding dependencies, always re-generate requirements.txt:
uv pip compile pyproject.toml -o requirements.txtmcf-main/
├── resume/ # Place your resume here (gitignored)
├── data/ # Database files (gitignored)
├── src/mcf/
│ ├── api/ # FastAPI server
│ ├── cli/ # CLI commands
│ ├── lib/
│ │ ├── crawler/ # Job crawler
│ │ ├── storage/ # DuckDB storage
│ │ ├── embeddings/ # Embedding generation
│ │ └── pipeline/ # Crawl pipeline
└── frontend/ # Next.js dashboard
- Job descriptions are not stored in the database to save space
- Only embeddings, basic info (title, company, location), and URLs are stored
- Click job URLs to see full descriptions on MyCareersFuture
- Jobs you've interacted with won't appear in future matches (unless you include them)
- Matches are sorted by similarity score, then by recency (newest first)
MIT