EDNA — Environmental DNA Analysis

Complete CSV-to-insights workflow for eDNA:

FastAPI backend for CSV upload, parsing, prediction, and JSON/CSV exports
React (Vite) frontend for interactive tables, charts, and reports
Windows-first setup and scripts

Note: There are two copies of the stack. Prefer the root apps unless you intentionally work in BioTrace/.

Primary: edna-backend and edna-frontend at repo root
Secondary (legacy/experimental): BioTrace/edna-backend and BioTrace/edna-frontend

TL;DR — Quick Start (Windows)

Backend:

cd d:\EDNA\edna-backend
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Frontend:

cd d:\EDNA\edna-frontend
npm install
echo VITE_API_BASE_URL=http://localhost:8000 > .env
npm run dev

Upload CSV in the UI (modal) or test via curl:

curl -X POST "http://localhost:8000/upload-csv/" ^
  -H "accept: application/json" ^
  -H "Content-Type: multipart/form-data" ^
  -F "file=@D:\path\to\your\data.csv"

Repository Structure

EDNA
├─ .venv
│  ├─ __pycache__
│  ├─ Lib
│  ├─ Scripts
│  ├─ .gitignore
│  ├─ predict_csv.py
│  ├─ pyvenv.cfg
│  └─ train_model.py
├─ .gitattributes
├─ .gitignore
├─ README.md
├─ BioTrace
│  ├─ .gitignore
│  ├─ edna-backend
│  │  ├─ app.py
│  │  ├─ config.py
│  │  ├─ requirements.txt
│  │  ├─ tests_post_upload.py
│  │  ├─ tests_temp_check_root.py
│  │  ├─ train_model.py
│  │  ├─ __pycache__
│  │  │  ├─ app.cpython-311.pyc
│  │  │  ├─ predict_csv.cpython-311.pyc
│  │  │  ├─ predict_csv.cpython-313.pyc
│  │  │  ├─ routes.cpython-311.pyc
│  │  │  └─ utils.cpython-311.pyc
│  │  ├─ app
│  │  │  ├─ __init__.py
│  │  │  ├─ main.py
│  │  │  ├─ routes.py
│  │  │  ├─ utils.py
│  │  │  └─ __pycache__
│  │  │     ├─ __init__.cpython-311.pyc
│  │  │     ├─ __init__.cpython-313.pyc
│  │  │     ├─ main.cpython-311.pyc
│  │  │     ├─ main.cpython-313.pyc
│  │  │     ├─ routes.cpython-311.pyc
│  │  │     ├─ routes.cpython-313.pyc
│  │  │     ├─ utils.cpython-311.pyc
│  │  │     └─ utils.cpython-313.pyc
│  │  ├─ data
│  │  │  ├─ results_export.json
│  │  │  ├─ sample_input.csv
│  │  │  ├─ sample_output.csv
│  │  │  └─ user_input.csv
│  │  └─ temp
│  │     └─ predict.csv
│  └─ edna-frontend
│     ├─ .gitignore
│     ├─ components.json
│     ├─ eslint.config.js
│     ├─ index.html
│     ├─ jsconfig.json
│     ├─ package.json
│     ├─ README.md
│     ├─ vite.config.js
│     ├─ public
│     │  └─ vite.svg
│     └─ src
│        ├─ App.css
│        ├─ App.jsx
│        ├─ index.css
│        ├─ main.jsx
│        ├─ api
│        │  └─ api.js
│        ├─ assets
│        │  └─ react.svg
│        ├─ components
│        │  ├─ Charts.jsx
│        │  ├─ DashboardPDF.jsx
│        │  ├─ DataTable.jsx
│        │  ├─ Navbar.jsx
│        │  ├─ ResultsSummary.jsx
│        │  ├─ SummaryCards.jsx
│        │  ├─ TaxonomyTable.jsx
│        │  ├─ UploadForm.jsx
│        │  └─ ui
│        │     ├─ button.jsx
│        │     ├─ card.jsx
│        │     ├─ input.jsx
│        │     └─ table.jsx
│        ├─ lib
│        │  └─ utils.js
│        └─ pages
│           ├─ Dashboard.jsx
│           └─ Home.jsx
├─ prediction_results
│  ├─ abundance_heatmap.html
│  ├─ network_graph.html
│  ├─ prediction_results.csv
│  ├─ rarefaction_curve.html
│  ├─ results_export.json
│  ├─ summary_report.txt
│  └─ taxonomic_barplot.html
├─ data
│  ├─ fasta_parsed.csv
│  ├─ merged_filtered_sequences.csv
│  └─ predict.csv
└─ processed_data
   ├─ edna_lgb_model.txt
   ├─ feature_columns.pkl
   ├─ feature_extractor_params.pkl
   ├─ known_data.pkl
   ├─ label_encoder.pkl
   ├─ scaler.pkl
   ├─ similarity_calculator.pkl
   └─ taxonomy_hierarchy.pkl

pip install fastapi "uvicorn[standard]" pydantic python-multipart numpy pandas scikit-learn lightgbm joblib python-dotenv


### Environment

Optional .env overrides:

CORS_ORIGINS=http://localhost:5173,http://localhost:3000 TEMP_DIR=d:\EDNA\edna-backend\temp MODEL_PATH=d:\EDNA\edna-backend\processed_data\edna_lgb_model.txt


Ensure upload dir exists:

powershell -Command "New-Item -ItemType Directory -Force -Path 'd:\EDNA\edna-backend\temp' | Out-Null"


### Run

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000


Key endpoints:

* GET /health
* POST /upload-csv/

---

## Frontend (React + Vite)

### Requirements

* Node.js 18+, npm 9+

### Install & Run

cd d:\EDNA\edna-frontend npm install echo VITE_API_BASE_URL=http://localhost:8000 > .env npm run dev


Dev server: [http://localhost:5173](http://localhost:5173)

---

## CSV Format and Data Flow

Minimum CSV input:

* `sequence`: DNA sequence string
* Optional: `sample_id`, metadata

Backend response (example):

```json
{
  "status": "ok",
  "predictions": [
    {
      "sequence": "...",
      "predicted_species": "Genus species",
      "confidence_score": 0.92
    }
  ],
  "summary": {},
  "timestamp": "..."
}

Project Workflow

1. Data Collection & Preprocessing

Extracted raw eDNA sequences from NCBI Entrez API.
Curated ~5,000 sequences for prototype training.
Cleaned, standardized, and labeled sequences.

2. Model Training

Trained a LightGBM model with preprocessing.
Supports:
- Top-3 Species Prediction with confidence scores.
- Novelty Clustering to group similar unknown sequences (clusters may suggest new species).

3. Prediction Workflow

Biologist Input: Uploads CSV with new sequences.
Backend (FastAPI):
- Processes CSV through /upload-csv/.
- Predicts species + confidence + clusters.
- Stores/exports results as CSV/JSON.
Frontend (React + Vite):
- Tables with sequence + top-3 predictions + confidence.
- Taxonomy/abundance charts.
- Export reports (CSV, charts, PDFs).

4. Feedback Loop

Verified predictions & novel clusters appended to dataset.
Model retrained periodically for improved accuracy.

Typical Workflow

Start backend:

cd d:\EDNA\edna-backend
.\.venv\Scripts\Activate.ps1
uvicorn app.main:app --reload --port 8000

Start frontend:

cd d:\EDNA\edna-frontend
npm run dev

In the UI:

Upload CSV → view predictions, charts, exports

Troubleshooting

422 errors: Request must be multipart/form-data with key file.
CORS issues: Add frontend origin to CORS_ORIGINS.
File not saving: Ensure python-multipart is installed & temp/ exists.

Scripts Cheat Sheet

Create/activate venv:

python -m venv .venv
.\.venv\Scripts\Activate.ps1

Install deps:

pip install -r requirements.txt

Run backend:

uvicorn app.main:app --reload --port 8000

Run frontend:

npm run dev

License

Add a LICENSE file (MIT/Apache-2.0 recommended).

Built for a smooth CSV-to-insights eDNA workflow on Windows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EDNA — Environmental DNA Analysis

TL;DR — Quick Start (Windows)

Repository Structure

Project Workflow

1. Data Collection & Preprocessing

2. Model Training

3. Prediction Workflow

4. Feedback Loop

Typical Workflow

Troubleshooting

Scripts Cheat Sheet

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.venv		.venv
BioTrace		BioTrace
processed_data		processed_data
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

EDNA — Environmental DNA Analysis

TL;DR — Quick Start (Windows)

Repository Structure

Project Workflow

1. Data Collection & Preprocessing

2. Model Training

3. Prediction Workflow

4. Feedback Loop

Typical Workflow

Troubleshooting

Scripts Cheat Sheet

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages