Main Files:
- ETL proccess use this: ETL_main.py
- LLM : bot.ipynb
This project is designed to perform an ETL (Extract, Transform, Load) process, loading data from CSV files into a database, scraping user profile data from LinkedIn using RapidAPI, and obtaining broker data from BrokerCheck. Additionally, it includes a Jupyter Notebook bot for querying the database based on the ingested data.
First, clone the repository to your local machine:
git clone https://github.com/your-username/your-repo.git
cd your-repoEnsure you have Python 3.11
Create a virtual environment to manage dependencies:
python -m venv venvActivate the virtual environment:
-
On Windows:
venv\Scripts\activate
-
On macOS/Linux:
source venv/bin/activate
conda create -n llm_etl python=3.11
conda activate llm_etlInstall the required Python packages:
pip install -r requirements.txtIn the project folder, create a file named .env using the env.example file.
cp env.example .envExecute the main.py script to perform the ETL process. This script will:
- Load data from the specified CSV file into the database.
- Scrape user profile data from LinkedIn using RapidAPI.
- Retrieve broker data from BrokerCheck.
Note: Before running the ETL, delete the advisors.db file
Run the script as follows:
python main.pyStart the Jupyter Notebook server:
jupyter notebookOpen the bot.ipynb file in the Jupyter interface. This notebook is designed to connect to the database and allow users to query the data based on their requirements.
