-
Notifications
You must be signed in to change notification settings - Fork 0
Open
1 / 21 of 2 issues completedLabels
Description
WHY: As a user, I want to have a relevant chatbot in the sense that the data accessible by the RAG must be accessible quickly and updated with a real data ingestion system.
DoD:
- Identify the data sources for the different types (audio, video, text, web, pdf) in our use case
- We want a pipeline for each type of data: Video/Audio Pipeline, PDF Pipeline, Web Pipeline, Plain Text Pipeline
- Add connectors capable of retrieving data from these sources (with the right rights, take all data, update data), one connector per pipeline
- Define triggers according to source type (Event driven, Cron, manual)
- Manage data transformation (Video -> Audio -> Transcript, PDF -> OCR -> Text, Html parsing -> Text, ...)
- Chunking strategies: Intelligent according to data type
- Quality control
- Metadata (origin and provenance of data)