Skip to content

πŸ¦– ClawBio β€” The first bioinformatics-native AI agent skill library. Local-first. Reproducible. Built on OpenClaw.

License

Notifications You must be signed in to change notification settings

ClawBio/ClawBio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

45 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ClawBio

πŸ¦– ClawBio

The first bioinformatics-native AI agent skill library.
Built on OpenClaw (180k+ GitHub stars). Local-first. Privacy-focused. Reproducible.

CI Python 3.10+ MIT License ClawHub Skills Open Issues Slides


See It in Action

A community contributor built a nutrigenomics skill and ran it β€” from raw genetic data to personalised nutrition report with radar charts, heatmaps, and reproducibility bundle:

https://github.com/ClawBio/ClawBio/releases/download/v0.2.0/david-nutrigx-demo.mp4

What just happened behind the scenes
  1. The AI agent read SKILL.md β€” a specification that encodes the correct bioinformatics decisions (40 SNPs, 13 nutrient domains, evidence-based risk thresholds)
  2. It ran the Python skill locally β€” no genetic data left the machine
  3. It produced a markdown report with figures, tables, and a reproducibility bundle (commands.sh, environment.yml, checksums.sha256)
  4. Anyone can re-run the exact same analysis and get identical results, SHA-256 verified

ClawBio PharmGx Demo
PharmGx Reporter: 12 genes, 51 drugs, under 1 second


The Problem

You read a paper. You want to reproduce Figure 3. So you:

  1. Go to GitHub. Clone the repo.
  2. Wrong Python version. Fix dependencies.
  3. Need the reference data β€” where is it?
  4. Download 2GB from Zenodo. Link is dead.
  5. Email the first author. Wait 3 weeks.
  6. Paths are hardcoded to /home/jsmith/data/.
  7. Two days later: still broken. You give up.

Now imagine the same paper published a skill:

python ancestry_pca.py --demo --output fig3
# Figure 3 reproduced. Identical. SHA-256 verified. 30 seconds.

That's ClawBio. Every figure in your paper should be one command away from reproduction.


πŸ¦– What Is ClawBio?

A skill is a domain expert's knowledge β€” frozen into code β€” that an AI agent executes correctly every time.

ChatGPT / Claude  = a smart generalist who guesses at bioinformatics
πŸ¦– ClawBio skill  = a domain expert's proven pipeline that the AI executes
  • Local-first: Your genomic data never leaves your laptop. No cloud uploads, no data exfiltration.
  • Reproducible: Every analysis exports commands.sh, environment.yml, and SHA-256 checksums. Anyone can reproduce it without the agent.
  • Modular: Each skill is a self-contained directory (SKILL.md + Python scripts) that plugs into the orchestrator.
  • MIT licensed: Open-source, free, community-driven.

Why Not Just Use ChatGPT?

Ask Claude to "profile my pharmacogenes from this 23andMe file." It'll write plausible Python. But:

  • It hallucinates star allele calls and uses outdated CPIC guidelines
  • It forgets CYP2D6 *4 is no-function (not reduced)
  • You spend 45 minutes debugging its output
  • No reproducibility bundle. No audit log. No checksums.

ClawBio encodes the correct bioinformatics decisions so the agent gets it right first time, every time.


πŸ” Provenance & Reproducibility

Every ClawBio analysis ships with a reproducibility bundle β€” not as an afterthought, but as part of the output:

report/
β”œβ”€β”€ report.md              # Full analysis with figures and tables
β”œβ”€β”€ figures/               # Publication-quality PNGs
β”œβ”€β”€ tables/                # CSV data tables
β”œβ”€β”€ commands.sh            # Exact commands to reproduce
β”œβ”€β”€ environment.yml        # Conda environment snapshot
└── checksums.sha256       # SHA-256 of every input and output file

Why this matters: a reviewer can re-run your analysis in 30 seconds. A collaborator can reproduce your Figure 3 without emailing you. Future-you can regenerate results two years later from the same bundle.


πŸ¦– Skills

Skill Status Description
Bio Orchestrator MVP Routes bioinformatics requests to the right specialist skill
PharmGx Reporter MVP Pharmacogenomic report: 12 genes, 51 drugs, CPIC guidelines
Ancestry PCA MVP PCA decomposition vs SGDP (345 samples, 164 global populations)
Semantic Similarity MVP Semantic Isolation Index for 175 GBD diseases from 13.1M PubMed abstracts
Equity Scorer Planned HEIM diversity metrics from VCF/ancestry data
VCF Annotator Planned Variant annotation with VEP, ClinVar, gnomAD + ancestry context
Lit Synthesizer Planned PubMed/bioRxiv search with LLM summarisation and citation graphs
scRNA Orchestrator Planned Scanpy automation: QC, clustering, DE analysis, visualisation
Struct Predictor Planned AlphaFold/Boltz local structure prediction
Repro Enforcer Planned Export any analysis as Conda env + Singularity + Nextflow pipeline

πŸ¦– MVP Skills in Detail

PharmGx Reporter β€” Personal Scale

Generates a pharmacogenomic report from consumer genetic data (23andMe, AncestryDNA):

  • Parses raw genetic data (auto-detects format)
  • Extracts 31 pharmacogenomic SNPs across 12 genes (CYP2C19, CYP2D6, CYP2C9, VKORC1, SLCO1B1, DPYD, TPMT, UGT1A1, CYP3A5, CYP2B6, NUDT15, CYP1A2)
  • Calls star alleles and determines metabolizer phenotypes
  • Looks up CPIC drug recommendations for 51 medications
  • Zero dependencies. Runs in < 1 second.
python pharmgx_reporter.py --input demo_patient.txt --output report

Demo result: CYP2D6 *4/*4 (Poor Metabolizer) β†’ 10 drugs AVOID (codeine, tramadol, 7 TCAs, tamoxifen), 20 caution, 21 standard.

~7% of people are CYP2D6 Poor Metabolizers β€” codeine gives them zero pain relief. ~0.5% carry DPYD variants where standard 5-FU dose can be lethal. This skill catches both.

Ancestry PCA β€” Population Scale

Runs principal component analysis on your cohort against the SGDP reference panel (345 samples, 164 global populations):

  • Contig normalisation (chr1 vs 1)
  • IBD removal (related individuals filtered)
  • Common biallelic SNPs only
  • Confidence ellipses per population
  • Publication-quality 4-panel figure generated instantly
python ancestry_pca.py --demo --output ancestry_report

Demo result: 736 Peruvian samples across 28 indigenous populations. Amazonian groups (Matzes, Awajun, Candoshi) sit in genetic space that no SGDP population occupies β€” genuinely underrepresented, not just in GWAS, but in the reference panels themselves.

Semantic Similarity Index β€” Systemic Scale

Computes a Semantic Isolation Index for diseases using 13.1M PubMed abstracts and PubMedBERT embeddings (768-dim):

  • SII (Semantic Isolation Index): higher = more isolated in literature
  • KTP (Knowledge Transfer Potential): higher = more cross-disease spillover
  • RCC (Research Clustering Coefficient): diversity of research approaches
  • Temporal Drift: how research focus evolves over time
  • Publication-quality 4-panel figure
python semantic_sim.py --demo --output sem_report

Key finding: Neglected tropical diseases are +38% more semantically isolated (P < 0.0001, Cohen's d = 0.84). 14 of the 25 most isolated diseases are Global South priority conditions. Knowledge silos kill innovation β€” a malaria immunology breakthrough could help leishmaniasis, but the literatures don't talk to each other.

Corpas et al. (2026). HEIM: Health Equity Index for Measuring structural bias in biomedical research. Under review.


Quick Start

Prerequisites

  • OpenClaw installed and configured
  • Python 3.9+
  • Bioinformatics tools for your skill of choice (see individual SKILL.md files)

Install and run

# Install a skill
openclaw install skills/pharmgx-reporter

# Run with natural language
openclaw "Profile the pharmacogenes in my 23andMe file at data/raw_genotype.txt"

# Or run directly
python skills/pharmgx-reporter/pharmgx_reporter.py --input data/raw_genotype.txt --output report

Every skill includes demo data so you can try it immediately without your own files.


πŸ¦– Architecture

User: "Analyse the diversity in my VCF file"
         β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
  β”‚  Bio         β”‚  ← routes by file type + keywords
  β”‚  Orchestratorβ”‚
  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
         β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                                                  β”‚
  PharmGx    Ancestry    Semantic    Equity    VCF
  Reporter   PCA         Similarity  Scorer    Annotator ...
  β”‚                                                  β”‚
  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
  β”‚  Markdown    β”‚  ← report + figures + checksums
  β”‚  Report      β”‚     + reproducibility bundle
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Each skill is standalone β€” the orchestrator routes to the right one, but every skill also works independently.

See docs/architecture.md for the full design.


Community Wanted Skills πŸ¦–

We want skills from the bioinformatics community. If you work with genomics, proteomics, metabolomics, imaging, or clinical data β€” wrap your pipeline as a skill.

Skill What Your expertise
claw-gwas PLINK/REGENIE automation Statistical genetics
claw-metagenomics Kraken2/MetaPhlAn wrapper Microbiome
claw-acmg Clinical variant classification Clinical genomics
claw-pathway GO/KEGG enrichment Functional genomics
claw-phylogenetics IQ-TREE/RAxML automation Evolutionary biology
claw-proteomics MaxQuant/DIA-NN Proteomics
claw-spatial Visium/MERFISH Spatial transcriptomics

See CONTRIBUTING.md for the submission process and templates/SKILL-TEMPLATE.md for the skill template.


Presentation

ClawBio was announced at the London Bioinformatics Meetup on 26 February 2026.


Citation

If you use ClawBio in your research, please cite:

@software{clawbio_2026,
  author = {Corpas, Manuel},
  title = {ClawBio: An Open-Source Library of AI Agent Skills for Reproducible Bioinformatics},
  year = {2026},
  url = {https://github.com/ClawBio/ClawBio}
}

Links

License

MIT β€” clone it, run it, build a skill, submit a PR. πŸ¦–

About

πŸ¦– ClawBio β€” The first bioinformatics-native AI agent skill library. Local-first. Reproducible. Built on OpenClaw.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors