Researcher: Khaireddine Arbouch
Acknowledgments: Special thanks to Andreas Trolle for the idea inspiration that led to the development of this platform.
This system provides a production-grade REST API for predicting the pathogenicity of genetic variants using the Evo2 deep learning model. The API is deployed on Modal's serverless infrastructure, enabling scalable variant analysis with GPU-accelerated inference. The system supports single nucleotide variants (SNVs), deletions, and insertions across multiple genome assemblies.
The core methodology employs zero-shot variant effect prediction by comparing the log-likelihood scores of reference and variant sequences. A delta score threshold, calibrated on BRCA1 training data, classifies variants as likely pathogenic or likely benign.
The system follows a serverless microservices architecture with the following components:
Client Application
|
| HTTP POST (JSON)
|
Modal Serverless Endpoint
|
| FastAPI Handler
|
Evo2Model Class (Modal Container)
|
| Sequence Fetching
|
UCSC Genome Browser API
|
| Sequence Scoring
|
Evo2 7B Model (GPU Inference)
|
| Delta Score Calculation
|
Pathogenicity Prediction Response
Modal Application Layer
- Application:
extended-evo2-snv-pathogenicity - Container class:
Evo2Modelwith persistent model loading - GPU configuration: NVIDIA H100 (configurable)
- Container scaling: Maximum 3 parallel instances
- Warm container retention: 120 seconds scaledown window
- Retry policy: 2 automatic retries on failure
Model Infrastructure
- Base model: Evo2 7B (7 billion parameters)
- Architecture: StripedHyena 2 (long-context transformer)
- Context window: 8,192 base pairs (8K context variant)
- Model source: HuggingFace (
arcinstitute/evo2_7b) - Model cache: Persistent Modal volume (
hf_cache) - Inference framework: Vortex with Transformer Engine and Flash Attention
API Layer
- Framework: FastAPI
- Authentication: Optional API key via
X-API-Keyheader - Request validation: Pydantic models with enum-based mutation types
- Error handling: HTTP status codes with descriptive error messages
Data Pipeline
- Sequence Retrieval: Fetches 8,192 bp genomic windows from UCSC Genome Browser API
- Variant Construction: Builds variant sequences based on mutation type (SNV, DELETION, INSERTION)
- Sequence Scoring: Computes mean log-likelihood scores for reference and variant sequences
- Delta Calculation: Computes delta score (variant_score - reference_score)
- Classification: Applies threshold-based classification with confidence scoring
Container Image
- Base image:
nvidia/cuda:12.4.0-devel-ubuntu22.04 - Python version: 3.12
- System dependencies: CUDA toolkit, cuDNN, build tools (GCC, CMake, Ninja)
- Python dependencies: Evo2 package, Transformer Engine, Flash Attention, FastAPI, Pydantic
Resource Configuration
- GPU: NVIDIA H100 (80GB VRAM)
- Volume mount:
/root/.cache/huggingface→ Modal persistent volume - Container lifecycle: Model loaded once via
@modal.enter()decorator - Memory management: Model remains in GPU memory for request duration
Scaling Characteristics
- Cold start latency: 10-30 seconds (container initialization) + 30-60 seconds (model loading)
- Warm request latency: 2-5 seconds (sequence fetching + inference)
- Throughput: Limited by GPU memory and sequence scoring time
- Cost optimization: Scaledown window keeps containers warm for 2 minutes
The pathogenicity prediction algorithm operates as follows:
-
Genomic Context Retrieval
- Fetches 8,192 bp window centered on variant position from UCSC API
- Validates chromosome format (requires
chrprefix, e.g.,chr17) - Supports multiple genome assemblies (hg38, hg19, etc.)
-
Reference Allele Determination
- Auto-detects reference allele from genome sequence if not provided
- Validates provided reference against genome sequence for SNVs
- Extracts reference sequence based on mutation type
-
Variant Sequence Construction
- SNV: Single nucleotide substitution at variant position
- DELETION: Removes reference nucleotide(s) from sequence
- INSERTION: Inserts alternative sequence after reference position
-
Sequence Scoring
- Tokenizes sequences using character-level tokenizer (vocab size: 512)
- Passes sequences through Evo2 model forward pass
- Computes log-likelihoods via log-softmax over vocabulary
- Reduces per-position log-likelihoods to mean score
-
Delta Score Calculation
delta_score = variant_mean_loglikelihood - reference_mean_loglikelihood
-
Pathogenicity Classification
- Threshold: -0.0009178519 (calibrated on BRCA1 data)
- Loss of function standard deviation: 0.0015140239
- Functional standard deviation: 0.0009016589
- Classification rule:
delta_score < threshold→ "Likely pathogenic"delta_score >= threshold→ "Likely benign"
- Confidence calculation: Distance from threshold normalized by appropriate standard deviation
Single Nucleotide Variant (SNV)
- Alternative: Single nucleotide (A, C, G, or T)
- Reference: Auto-detected or validated against genome
- Example:
chr17:43119628 A>G
Deletion
- Alternative:
"-"or empty string - Reference: Nucleotide(s) to delete
- Implementation: Removes reference sequence from genomic window
- Example:
chr17:43119628 T>-
Insertion
- Alternative: Sequence of nucleotides to insert
- Reference: Nucleotide before insertion point
- Implementation: Inserts alternative sequence after reference position
- Example:
chr17:43119628 T>ACGT
The API supports optional API key authentication:
- Development mode: If
MODAL_API_KEYenvironment variable is not set, all requests are allowed - Production mode: If
MODAL_API_KEYis set, requests must includeX-API-Keyheader with matching value - Secret management: API keys stored in Modal secrets, injected as environment variables
- Error responses: 401 for missing key, 403 for invalid key
POST https://{workspace}--evo2-snv-pathogenicity-evo2model-analyze-single-variant.modal.run| Header | Required | Description |
|---|---|---|
Content-Type |
Yes | Must be application/json |
X-API-Key |
Conditional | Required if MODAL_API_KEY environment variable is set |
{
"variant_position": 43119628,
"alternative": "G",
"genome": "hg38",
"chromosome": "chr17",
"mutation_type": "SNV",
"reference": "A"
}Field Specifications
| Field | Type | Required | Description |
|---|---|---|---|
variant_position |
int |
Yes | Genomic position (1-based coordinate system) |
alternative |
str |
Yes | Alternative allele (see mutation type specifications) |
genome |
str |
Yes | Genome assembly identifier (e.g., "hg38", "hg19") |
chromosome |
str |
Yes | Chromosome identifier with chr prefix (e.g., "chr17", "chr1") |
mutation_type |
str |
No | Mutation type enum: "SNV", "DELETION", "INSERTION" (default: "SNV") |
reference |
str |
No | Reference allele (auto-detected if not provided) |
{
"position": 43119628,
"chromosome": "chr17",
"genome": "hg38",
"reference": "A",
"alternative": "G",
"delta_score": -0.001234,
"prediction": "Likely pathogenic",
"classification_confidence": 0.85,
"mutation_type": "SNV"
}Response Field Descriptions
| Field | Type | Description |
|---|---|---|
position |
int |
Genomic position (1-based) |
chromosome |
str |
Chromosome identifier |
genome |
str |
Genome assembly identifier |
reference |
str |
Reference allele sequence |
alternative |
str |
Alternative allele sequence (empty string for deletions) |
delta_score |
float |
Log-likelihood difference (variant - reference). Negative values indicate loss of function. |
prediction |
str |
Classification: "Likely pathogenic" or "Likely benign" |
classification_confidence |
float |
Confidence score in range [0.0, 1.0] |
mutation_type |
str |
Type of mutation analyzed |
400 Bad Request
{
"detail": "For SNV, alternative must be a single nucleotide (A, C, G, or T)"
}401 Unauthorized
{
"detail": "API key required. Please provide X-API-Key header."
}403 Forbidden
{
"detail": "Invalid API key."
}500 Internal Server Error
{
"detail": "Failed to fetch genome sequence from UCSC API: 500"
}- Python 3.12+
- Git
- Modal account (Sign up)
- Basic understanding of Python and REST APIs
- Local machine: Any OS (Windows, macOS, Linux)
- Modal account: Free tier available (includes GPU credits)
- Internet connection: Required for deployment and API calls
Step 1: Install Modal CLI
pip install modal
modal --versionStep 2: Authenticate with Modal
Choose one of the following authentication methods:
Option A: Browser OAuth (Recommended for Development)
modal token newThis opens your browser and prompts you to sign in with GitHub, Google, or Email, then automatically saves credentials locally.
Option B: API Token (Recommended for CI/CD)
- Navigate to Modal Settings → API Tokens
- Click "Create Token"
- Copy the
MODAL_TOKEN_IDandMODAL_TOKEN_SECRET - Set environment variables:
# Linux/macOS
export MODAL_TOKEN_ID="ak-xxxxxxxx"
export MODAL_TOKEN_SECRET="as-xxxxxxxx"
# Windows (PowerShell)
$env:MODAL_TOKEN_ID="ak-xxxxxxxx"
$env:MODAL_TOKEN_SECRET="as-xxxxxxxx"
# Windows (CMD)
set MODAL_TOKEN_ID=ak-xxxxxxxx
set MODAL_TOKEN_SECRET=as-xxxxxxxxStep 3: Verify Authentication
modal app listIf successful, you'll see a list of your Modal apps (may be empty initially).
Step 1: Navigate to Backend Directory
cd backendStep 2: Review Configuration
Open main.py and verify the following configuration:
# App name (change if needed)
app = modal.App("extended-evo2-snv-pathogenicity", image=evo2_image)
# GPU configuration
@app.cls(
gpu="H100", # NVIDIA H100 GPU
volumes={mount_path: volume}, # Model cache volume
max_containers=3, # Max parallel instances
retries=2, # Auto-retry on failure
scaledown_window=120 # Keep warm for 2 minutes
)Note: H100 GPUs are premium resources. For testing, you can temporarily use gpu="A10G" or gpu="T4" (cheaper but slower).
Step 3: Deploy the API
modal deploy main.pyFirst deployment will take 10-15 minutes because it needs to:
- Build the Docker container image
- Install all dependencies (CUDA, PyTorch, Evo2, etc.)
- Download the Evo2 7B model (~14GB)
- Create the HuggingFace cache volume
You'll see output like:
âś“ Building image...
âś“ Installing dependencies...
âś“ Downloading model...
âś“ Created app: extended-evo2-snv-pathogenicity
âś“ Deployed Evo2Model.analyze_single_variant
↳ https://your-workspace--evo2-snv-pathogenicity-evo2model-analyze-single-variant.modal.run
Save the endpoint URL - you'll need it for API calls.
Step 4: Verify Deployment
# List all deployed apps
modal app list
# View app details
modal app show extended-evo2-snv-pathogenicity
# View logs
modal app logs extended-evo2-snv-pathogenicity --followSetting Up API Key Authentication
For production use, secure your endpoint with API key authentication:
Step 1: Create Modal Secret
# Create a secret to store your API key
modal secret create evo2-api-key MODAL_API_KEY=your-secret-api-key-hereImportant: Choose a strong, random API key. You can generate one:
# Linux/macOS
openssl rand -hex 32
# Python
python -c "import secrets; print(secrets.token_urlsafe(32))"Step 2: Update main.py to Use Secret
Update your Modal class to use the secret:
@app.cls(
gpu="H100",
volumes={mount_path: volume},
max_containers=3,
retries=2,
scaledown_window=120,
secrets=[modal.Secret.from_name("evo2-api-key")] # Add this line
)
class Evo2Model:
...Then redeploy:
modal deploy main.pyStep 3: Test API Key Authentication
# Without API key (should fail in production mode)
curl -X POST "https://your-endpoint-url.modal.run" \
-H "Content-Type: application/json" \
-d '{"variant_position": 43119628, "alternative": "G", "genome": "hg38", "chromosome": "chr17"}'
# With API key (should succeed)
curl -X POST "https://your-endpoint-url.modal.run" \
-H "Content-Type: application/json" \
-H "X-API-Key: your-secret-api-key-here" \
-d '{"variant_position": 43119628, "alternative": "G", "genome": "hg38", "chromosome": "chr17"}'curl -X POST "https://your-endpoint-url.modal.run" \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"variant_position": 43119628,
"alternative": "G",
"genome": "hg38",
"chromosome": "chr17",
"mutation_type": "SNV"
}'Expected Response:
{
"position": 43119628,
"chromosome": "chr17",
"genome": "hg38",
"reference": "A",
"alternative": "G",
"delta_score": -0.001234,
"prediction": "Likely pathogenic",
"classification_confidence": 0.85,
"mutation_type": "SNV"
}curl -X POST "https://your-endpoint-url.modal.run" \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"variant_position": 43119628,
"alternative": "-",
"genome": "hg38",
"chromosome": "chr17",
"mutation_type": "DELETION"
}'curl -X POST "https://your-endpoint-url.modal.run" \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"variant_position": 43119628,
"alternative": "ACGT",
"genome": "hg38",
"chromosome": "chr17",
"mutation_type": "INSERTION"
}'import requests
# Your endpoint URL
url = "https://your-endpoint-url.modal.run"
# Your API key
headers = {
"Content-Type": "application/json",
"X-API-Key": "your-api-key-here"
}
# Request payload
payload = {
"variant_position": 43119628,
"alternative": "G",
"genome": "hg38",
"chromosome": "chr17",
"mutation_type": "SNV"
}
# Make request
response = requests.post(url, json=payload, headers=headers)
response.raise_for_status()
# Parse response
result = response.json()
print(f"Prediction: {result['prediction']}")
print(f"Delta Score: {result['delta_score']:.6f}")
print(f"Confidence: {result['classification_confidence']:.2%}")# Stream logs in real-time
modal app logs evo2-snv-pathogenicity --follow
# View recent logs
modal app logs evo2-snv-pathogenicity --tail 100
# View logs for specific function
modal function logs evo2-snv-pathogenicity::Evo2Model::analyze_single_variant# View app status
modal app show evo2-snv-pathogenicity
# View running containers
modal container list
# View volume usage
modal volume listVisit Modal Dashboard to:
- View real-time metrics
- Monitor GPU usage
- Track API calls
- View error rates
- Check costs
# Standard redeploy (uses cached image if possible)
modal deploy main.py
# Force rebuild (rebuilds container image)
modal deploy main.py --force-build- Update
requirements.txt - Redeploy:
modal deploy main.py --force-build- Edit
main.py(e.g., change GPU type, max containers) - Redeploy:
modal deploy main.pySolution: Increase build timeout or use a smaller GPU for testing.
# In main.py, temporarily use a smaller GPU
@app.cls(
gpu="A10G", # Instead of H100
...
)Solution: The model is too large for the selected GPU.
- Use H100 GPU (recommended)
- Or reduce model size (if using custom model)
Solution: Check HuggingFace access and retry.
# Force rebuild to retry download
modal deploy main.py --force-buildSolution: Verify chromosome format.
- Correct:
"chr17","chrX" - Incorrect:
"17","X"
Solution: Verify secret is attached and environment variable is set.
# Check if secret exists
modal secret list
# Verify secret contents (will show masked value)
modal secret show evo2-api-key
# Check if secret is attached to app
modal app show evo2-snv-pathogenicitySolution: Verify endpoint URL and deployment status.
# List all endpoints
modal app list
# Get endpoint URL
modal app show evo2-snv-pathogenicitySolution: This is normal. The container needs to:
- Start up (~10-30 seconds)
- Load model into GPU memory (~30-60 seconds)
Mitigation: Use scaledown_window to keep containers warm:
@app.cls(
...
scaledown_window=300, # Keep warm for 5 minutes
)| GPU Type | Cost per Hour | Use Case |
|---|---|---|
| H100 | ~$4-8/hour | Production (fastest) |
| A100 | ~$2-4/hour | Production (good balance) |
| A10G | ~$1-2/hour | Development/testing |
| T4 | ~$0.50/hour | Light testing |
- Use appropriate GPU: Use A10G or T4 for development
- Set scaledown_window: Keep containers warm to avoid cold starts
- Monitor usage: Check Modal dashboard regularly
- Set max_containers: Limit parallel instances to control costs
# Deploy
modal deploy main.py
# View logs
modal app logs evo2-snv-pathogenicity --follow
# List apps
modal app list
# Show app details
modal app show evo2-snv-pathogenicity
# Create secret
modal secret create evo2-api-key MODAL_API_KEY=your-key
# View secrets
modal secret list
# Force rebuild
modal deploy main.py --force-build
# Delete app (careful!)
modal app stop evo2-snv-pathogenicity- Modal CLI installed and authenticated
-
main.pyreviewed and configured - API deployed successfully
- Endpoint URL saved
- API key secret created (for production)
- Test request successful
- Logs monitored
- Dashboard access verified
- Frontend configured with endpoint URL
Need Help? Check the Troubleshooting section or visit the Modal Community.
.png)