This project manages and automates news data workflows. It collects news article titles and classifies them into categories using embeddings and vector search.
Note: This project is not finished and is currently under active development.
This repository provides all the code and configuration to:
- Ingest news data from multiple sources.
- Process and clean collected news articles.
- Orchestrate workflows using Prefect.
- Run and monitor data pipelines locally.
- Automated news data ingestion and processing.
- Integration with Prefect for workflow management.
- Local and programmatic execution of flows.
Clone the repository and install the package locally:
git clone https://github.com/your-username/news-orchestration.git
cd news-orchestration
pip install .Run flows using the Prefect CLI:
prefect deployment run 'pipeline/news_scraping'You can also customize or create new flows by editing the Python modules in the repository.
This project is distributed under the MIT License.