Skip to content

Open-source AI stethoscope: COPD detection from lung sounds using Google HeAR. AUROC 0.890 (COPD vs all patients) across 4 countries.

License

Notifications You must be signed in to change notification settings

petecod/openstethoscope

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenStethoscope

Open-source AI-powered lung auscultation on consumer hardware.

Build a wireless digital stethoscope for under $100, record lung sounds, and run them through a COPD detection model — locally on your own machine or via the hosted demo.

⚠️ Research demo only. Not a medical device. Not validated for clinical use. Do not use for clinical decision-making.


What It Does

Hardware: Converts any standard acoustic stethoscope into a wireless digital recording device using a DJI Mic Mini (approx. $50) and a used stethoscope (approx. $50). Assembly takes about 30 minutes. See ASSEMBLY.md.

Software — HEAR-COPD (proof of concept): Uses Google's HeAR foundation model (pretrained on 313 million health-related audio clips) as a frozen feature extractor. Lung sounds are split into 2-second windows, embedded with HeAR, averaged per patient, and scored with a logistic regression classifier. The classifier is trained on standardised (z-scored) 512-dimensional embeddings with L2 regularisation and class-weight balancing to handle dataset imbalance; performance is estimated via 5-fold cross-validation with strict patient-level separation so that no recording from a test patient is ever seen during training.

Pre-trained classifiers are published in this repo.


Performance

Trained on 280 patients across 4 countries (Portugal, Greece, Jordan, Turkey). Patient-level evaluation with StratifiedGroupKFold cross-validation (5 folds).

Task AUROC 95% CI n+ n− n
COPD vs All patients (primary) 0.890 [0.848–0.930] 117 163 280
COPD vs Healthy 0.939 [0.900–0.970] 117 61 178
Asthma vs Healthy 0.877 [0.798–0.944] 34 61 95
Heart Failure vs Healthy 0.858 [0.754–0.945] 19 61 80
Pneumonia vs Healthy 0.806 [0.625–0.949] 11 61 72

Key insight: Google's HeAR model was pretrained on the YouTube Non-Semantic Speech corpus and other general health audio — it has never seen a stethoscope recording and has no knowledge of COPD or any other lung condition. Its ability to differentiate respiratory pathologies is entirely emergent, a byproduct of learning general acoustic health representations from hundreds of millions of YouTube clips at scale. A linear probe on top of these frozen embeddings already shows high discriminative accuracy.


Demo

Live app: opensteth.agentic-medicine.com

Research demo — not a medical device — not validated for clinical use


Run Locally

Prerequisites

Setup

git clone https://github.com/petecod/openstethoscope
cd openstethoscope

pip install -r app/requirements.txt

export HF_TOKEN=hf_your_token_here
python download_hear.py        # downloads HeAR model once (~500 MB)

python app/server.py           # starts on http://localhost:8080

Open http://localhost:8080 in your browser. Microphone access works natively on localhost — no HTTPS required.

Docker

docker build --build-arg HF_TOKEN=$HF_TOKEN -t openstetho .
docker run -p 8080:8080 openstetho

Accessing From Your Phone

Browsers require HTTPS for microphone access on any non-localhost address. Three options:

Option 1 — Tailscale (recommended)

Tailscale is a free VPN that gives your machine a trusted HTTPS address. Install it on both your computer and phone, then run:

tailscale serve --bg http://localhost:8080

This gives you a permanent private HTTPS URL (https://<your-name>.ts.net) accessible only from your Tailscale devices. No port-forwarding, no public exposure.

Option 2 — Cloudflare Quick Tunnel (no account needed)

cloudflared tunnel --url http://localhost:8080

Gives a temporary public HTTPS URL. Note: audio passes through Cloudflare's servers.

Option 3 — ngrok

ngrok http 8080

Similar to Cloudflare tunnel. Free tier available at ngrok.com.


Privacy

When running locally, all audio is processed on your own machine. Nothing leaves your device.

When using the hosted demo at opensteth.agentic-medicine.com: audio is sent to the inference server (Azure, Sweden Central) solely to compute the AI score, then permanently deleted. No database, no logs, no third-party sharing.


Reproduce the Results

The full training pipeline is in pipeline/. You will need to download the original datasets separately (they cannot be redistributed):

Dataset Patients Source
ICBHI 2017 126 bhichallenge.med.auth.gr
Fraiwan/KAUH 112 Mendeley jwyy9np4gv
RDTR 42 Mendeley p9z4h98s6j
# 1. Embed each dataset
python pipeline/02_embeddings/embed_with_hear.py --dataset icbhi  --wav_dir /path/to/icbhi  --out pipeline/02_embeddings/icbhi.npz
python pipeline/02_embeddings/embed_with_hear.py --dataset fraiwan --wav_dir /path/to/fraiwan --out pipeline/02_embeddings/fraiwan.npz
python pipeline/02_embeddings/embed_with_hear.py --dataset rdtr    --wav_dir /path/to/rdtr    --out pipeline/02_embeddings/rdtr.npz

# 2. Train classifiers
python pipeline/03_classifiers/train_classifiers.py

# 3. Results in pipeline/04_results/

Full results: pipeline/04_results/README.md


Hardware Assembly

Full guide with photos: ASSEMBLY.md

Total cost: <$100 USD, ~30 minutes


Citation

@software{westarp2026openstethoscope,
  author  = {Westarp, Peter},
  title   = {OpenStethoscope: Open-Source AI Lung Auscultation on Consumer Hardware},
  year    = {2026},
  url     = {https://github.com/petecod/openstethoscope}
}

Clinical Validation

Before HEAR-COPD can be considered for routine clinical use, rigorous prospective validation in real-world acute care settings is required to confirm safety, calibration, generalizability, and impact on patient outcomes. The results reported here are from retrospective research datasets and do not substitute for prospective clinical evidence.


License

Apache 2.0 — see LICENSE

About

Open-source AI stethoscope: COPD detection from lung sounds using Google HeAR. AUROC 0.890 (COPD vs all patients) across 4 countries.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors