MyCareersFuture job crawler for Singapore.
Crawl all jobs to parquet:
mcf crawlOptions:
-o, --output— Output directory (default:data/jobs)-r, --rate-limit— Requests per second (default: 4.0)-l, --limit— Max jobs to fetch (for testing)
from mcf.lib.api.client import MCFClient
from mcf.lib.crawler.crawler import Crawler
# Direct API access
with MCFClient() as client:
results = client.search_jobs(keywords="python", limit=10)
job = client.get_job_detail(results.results[0].uuid)
# Batch crawl
crawler = Crawler(rate_limit=5.0)
result = crawler.crawl(categories=["Information Technology"], limit=100)
df = result.jobs # pandas DataFrameTo add a new production dependency (e.g., 'requests'):
uv add requestsTo add a new development dependency (e.g., 'ipdb'):
uv add --dev ipdbAfter adding dependencies, always re-generate requirements.txt:
uv pip compile pyproject.toml -o requirements.txtTo build your project's distributable packages (.whl, .tar.gz):
python -m buildOr using the virtual environment directly:
./venv/bin/python -m buildTo build offline packages for deployment:
./dev_scripts/build_offline.shThis will create offline_packages/ with all dependencies and install.sh