Complete CSV-to-insights workflow for eDNA:
- FastAPI backend for CSV upload, parsing, prediction, and JSON/CSV exports
- React (Vite) frontend for interactive tables, charts, and reports
- Windows-first setup and scripts
Note: There are two copies of the stack. Prefer the root apps unless you intentionally work in BioTrace/.
- Primary: edna-backend and edna-frontend at repo root
- Secondary (legacy/experimental): BioTrace/edna-backend and BioTrace/edna-frontend
-
Backend:
cd d:\EDNA\edna-backend python -m venv .venv .\.venv\Scripts\Activate.ps1 pip install -r requirements.txt uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 -
Frontend:
cd d:\EDNA\edna-frontend npm install echo VITE_API_BASE_URL=http://localhost:8000 > .env npm run dev -
Upload CSV in the UI (modal) or test via curl:
curl -X POST "http://localhost:8000/upload-csv/" ^ -H "accept: application/json" ^ -H "Content-Type: multipart/form-data" ^ -F "file=@D:\path\to\your\data.csv"
EDNA
├─ .venv
│ ├─ __pycache__
│ ├─ Lib
│ ├─ Scripts
│ ├─ .gitignore
│ ├─ predict_csv.py
│ ├─ pyvenv.cfg
│ └─ train_model.py
├─ .gitattributes
├─ .gitignore
├─ README.md
├─ BioTrace
│ ├─ .gitignore
│ ├─ edna-backend
│ │ ├─ app.py
│ │ ├─ config.py
│ │ ├─ requirements.txt
│ │ ├─ tests_post_upload.py
│ │ ├─ tests_temp_check_root.py
│ │ ├─ train_model.py
│ │ ├─ __pycache__
│ │ │ ├─ app.cpython-311.pyc
│ │ │ ├─ predict_csv.cpython-311.pyc
│ │ │ ├─ predict_csv.cpython-313.pyc
│ │ │ ├─ routes.cpython-311.pyc
│ │ │ └─ utils.cpython-311.pyc
│ │ ├─ app
│ │ │ ├─ __init__.py
│ │ │ ├─ main.py
│ │ │ ├─ routes.py
│ │ │ ├─ utils.py
│ │ │ └─ __pycache__
│ │ │ ├─ __init__.cpython-311.pyc
│ │ │ ├─ __init__.cpython-313.pyc
│ │ │ ├─ main.cpython-311.pyc
│ │ │ ├─ main.cpython-313.pyc
│ │ │ ├─ routes.cpython-311.pyc
│ │ │ ├─ routes.cpython-313.pyc
│ │ │ ├─ utils.cpython-311.pyc
│ │ │ └─ utils.cpython-313.pyc
│ │ ├─ data
│ │ │ ├─ results_export.json
│ │ │ ├─ sample_input.csv
│ │ │ ├─ sample_output.csv
│ │ │ └─ user_input.csv
│ │ └─ temp
│ │ └─ predict.csv
│ └─ edna-frontend
│ ├─ .gitignore
│ ├─ components.json
│ ├─ eslint.config.js
│ ├─ index.html
│ ├─ jsconfig.json
│ ├─ package.json
│ ├─ README.md
│ ├─ vite.config.js
│ ├─ public
│ │ └─ vite.svg
│ └─ src
│ ├─ App.css
│ ├─ App.jsx
│ ├─ index.css
│ ├─ main.jsx
│ ├─ api
│ │ └─ api.js
│ ├─ assets
│ │ └─ react.svg
│ ├─ components
│ │ ├─ Charts.jsx
│ │ ├─ DashboardPDF.jsx
│ │ ├─ DataTable.jsx
│ │ ├─ Navbar.jsx
│ │ ├─ ResultsSummary.jsx
│ │ ├─ SummaryCards.jsx
│ │ ├─ TaxonomyTable.jsx
│ │ ├─ UploadForm.jsx
│ │ └─ ui
│ │ ├─ button.jsx
│ │ ├─ card.jsx
│ │ ├─ input.jsx
│ │ └─ table.jsx
│ ├─ lib
│ │ └─ utils.js
│ └─ pages
│ ├─ Dashboard.jsx
│ └─ Home.jsx
├─ prediction_results
│ ├─ abundance_heatmap.html
│ ├─ network_graph.html
│ ├─ prediction_results.csv
│ ├─ rarefaction_curve.html
│ ├─ results_export.json
│ ├─ summary_report.txt
│ └─ taxonomic_barplot.html
├─ data
│ ├─ fasta_parsed.csv
│ ├─ merged_filtered_sequences.csv
│ └─ predict.csv
└─ processed_data
├─ edna_lgb_model.txt
├─ feature_columns.pkl
├─ feature_extractor_params.pkl
├─ known_data.pkl
├─ label_encoder.pkl
├─ scaler.pkl
├─ similarity_calculator.pkl
└─ taxonomy_hierarchy.pkl
pip install fastapi "uvicorn[standard]" pydantic python-multipart numpy pandas scikit-learn lightgbm joblib python-dotenv
### Environment
Optional .env overrides:
CORS_ORIGINS=http://localhost:5173,http://localhost:3000 TEMP_DIR=d:\EDNA\edna-backend\temp MODEL_PATH=d:\EDNA\edna-backend\processed_data\edna_lgb_model.txt
Ensure upload dir exists:
powershell -Command "New-Item -ItemType Directory -Force -Path 'd:\EDNA\edna-backend\temp' | Out-Null"
### Run
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Key endpoints:
* GET /health
* POST /upload-csv/
---
## Frontend (React + Vite)
### Requirements
* Node.js 18+, npm 9+
### Install & Run
cd d:\EDNA\edna-frontend npm install echo VITE_API_BASE_URL=http://localhost:8000 > .env npm run dev
Dev server: [http://localhost:5173](http://localhost:5173)
---
## CSV Format and Data Flow
Minimum CSV input:
* `sequence`: DNA sequence string
* Optional: `sample_id`, metadata
Backend response (example):
```json
{
"status": "ok",
"predictions": [
{
"sequence": "...",
"predicted_species": "Genus species",
"confidence_score": 0.92
}
],
"summary": {},
"timestamp": "..."
}
- Extracted raw eDNA sequences from NCBI Entrez API.
- Curated ~5,000 sequences for prototype training.
- Cleaned, standardized, and labeled sequences.
-
Trained a LightGBM model with preprocessing.
-
Supports:
- Top-3 Species Prediction with confidence scores.
- Novelty Clustering to group similar unknown sequences (clusters may suggest new species).
-
Biologist Input: Uploads CSV with new sequences.
-
Backend (FastAPI):
- Processes CSV through
/upload-csv/. - Predicts species + confidence + clusters.
- Stores/exports results as CSV/JSON.
- Processes CSV through
-
Frontend (React + Vite):
- Tables with sequence + top-3 predictions + confidence.
- Taxonomy/abundance charts.
- Export reports (CSV, charts, PDFs).
- Verified predictions & novel clusters appended to dataset.
- Model retrained periodically for improved accuracy.
- Start backend:
cd d:\EDNA\edna-backend
.\.venv\Scripts\Activate.ps1
uvicorn app.main:app --reload --port 8000
- Start frontend:
cd d:\EDNA\edna-frontend
npm run dev
- In the UI:
- Upload CSV → view predictions, charts, exports
- 422 errors: Request must be
multipart/form-datawith keyfile. - CORS issues: Add frontend origin to CORS_ORIGINS.
- File not saving: Ensure python-multipart is installed &
temp/exists.
- Create/activate venv:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
- Install deps:
pip install -r requirements.txt
- Run backend:
uvicorn app.main:app --reload --port 8000
- Run frontend:
npm run dev
Add a LICENSE file (MIT/Apache-2.0 recommended).
Built for a smooth CSV-to-insights eDNA workflow on Windows.