A lightweight machine learning pipeline to detect duplicate questions using Bag-of-Words and other preprocessing techniques.
git clone https://github.com/aldol07/DuoDetect.git
cd duplicatepairs
### 2. Install dependencies with uv
uv venv
uv pip install -r requirements.txt
🧪 Usage
Run the main script:
streamlit run main.py
🔍 Features
Clean, normalized text preprocessing
Bag-of-Words feature extraction
Model training and persistence (model.pkl)
Cross-validation strategies (cv.pkl)
Jupyter-based experimentation
Live at: duodetectbyaldol.streamlit.app/