Comprehensive security guidelines, threat model, and mitigation strategies for the SciFind backend.
- Security Overview
- Threat Model
- Authentication & Authorization
- API Security
- Data Protection
- Infrastructure Security
- Provider Security
- Monitoring & Incident Response
- Security Best Practices
- Compliance & Auditing
- Security Checklist
SciFind backend implements a defense-in-depth security strategy with multiple layers of protection:
- Perimeter Security: API gateway with rate limiting and input validation
- Application Security: Secure coding practices and dependency management
- Data Security: Encryption at rest and in transit
- Infrastructure Security: Container security and network isolation
- Operational Security: Monitoring, logging, and incident response
graph TB
subgraph "External Threats"
ATTACKER[Malicious Actors]
BOTS[Automated Bots]
INSIDER[Insider Threats]
end
subgraph "Perimeter Defense"
WAF[Web Application Firewall]
RATE_LIMIT[Rate Limiting]
DDoS_PROTECTION[DDoS Protection]
end
subgraph "Application Layer"
AUTH[Authentication]
AUTHZ[Authorization]
INPUT_VAL[Input Validation]
OUTPUT_ENC[Output Encoding]
end
subgraph "Data Layer"
ENCRYPT_TRANSIT[TLS Encryption]
ENCRYPT_REST[Database Encryption]
ACCESS_CONTROL[Access Controls]
end
subgraph "Infrastructure"
CONTAINER_SEC[Container Security]
NETWORK_SEG[Network Segmentation]
SECRETS_MGMT[Secrets Management]
end
subgraph "Monitoring"
LOGGING[Security Logging]
SIEM[SIEM Integration]
ALERTS[Real-time Alerts]
end
ATTACKER --> WAF
BOTS --> RATE_LIMIT
INSIDER --> AUTH
WAF --> AUTH
RATE_LIMIT --> INPUT_VAL
DDoS_PROTECTION --> AUTHZ
AUTH --> ENCRYPT_TRANSIT
INPUT_VAL --> ACCESS_CONTROL
OUTPUT_ENC --> ENCRYPT_REST
ENCRYPT_TRANSIT --> CONTAINER_SEC
ACCESS_CONTROL --> NETWORK_SEG
ENCRYPT_REST --> SECRETS_MGMT
CONTAINER_SEC --> LOGGING
NETWORK_SEG --> SIEM
SECRETS_MGMT --> ALERTS
style ATTACKER fill:#ff6b6b
style BOTS fill:#ff6b6b
style INSIDER fill:#ff6b6b
style WAF fill:#4ecdc4
style AUTH fill:#45b7d1
style ENCRYPT_TRANSIT fill:#96ceb4
style CONTAINER_SEC fill:#feca57
style LOGGING fill:#ff9ff3
- SQL Injection: Malicious SQL queries through input fields
- Cross-Site Scripting (XSS): Injection of malicious scripts
- Cross-Site Request Forgery (CSRF): Unauthorized actions on behalf of users
- Input Validation Bypass: Malformed data to bypass security controls
Mitigations:
- Parameterized queries with GORM ORM
- Input validation and sanitization
- Output encoding
- CSRF tokens for state-changing operations
- Broken Authentication: Weak or compromised authentication mechanisms
- Broken Authorization: Inadequate access controls
- Sensitive Data Exposure: Leaking of confidential information
- Rate Limit Bypass: Overwhelming the system with requests
Mitigations:
- Strong API key authentication
- Role-based access control (RBAC)
- Data encryption and masking
- Multi-layer rate limiting
- Container Escape: Breaking out of container isolation
- Privilege Escalation: Gaining unauthorized elevated access
- Network Attacks: Man-in-the-middle, eavesdropping
- Supply Chain Attacks: Compromised dependencies
Mitigations:
- Distroless container images
- Non-root user execution
- TLS encryption for all communications
- Dependency scanning and vulnerability management
- Database Compromise: Unauthorized access to stored data
- Backup Exposure: Insecure backup storage
- Log Data Leakage: Sensitive information in logs
- Memory Dumps: Extraction of data from memory
Mitigations:
- Database encryption at rest
- Secure backup procedures
- Log sanitization
- Memory protection techniques
| Threat | Likelihood | Impact | Risk Level | Mitigation Priority |
|---|---|---|---|---|
| SQL Injection | Low | High | Medium | High |
| API Key Compromise | Medium | High | High | High |
| DDoS Attack | High | Medium | High | High |
| Container Escape | Low | High | Medium | Medium |
| Data Exfiltration | Medium | High | High | High |
| Insider Threat | Low | High | Medium | Medium |
SciFind uses API key-based authentication for simplicity and performance:
security:
api_keys:
- "prod-key-1-$(openssl rand -hex 32)"
- "prod-key-2-$(openssl rand -hex 32)"
rate_limit:
enabled: true
requests: 1000
window: "1m"
burst_size: 50- Key Generation: Use cryptographically secure random generation
- Key Rotation: Rotate keys regularly (recommended: every 90 days)
- Key Storage: Store keys in secure secret management systems
- Key Transmission: Only transmit over HTTPS
# Generate secure API key
API_KEY=$(openssl rand -hex 32)
# Set in environment
export SCIFIND_SECURITY_API_KEYS="$API_KEY"
# Use in requests
curl -H "Authorization: Bearer $API_KEY" https://api.scifind.com/v1/searchCurrently implements simple API key-based authorization. Future enhancements planned:
# Future implementation
roles:
reader:
permissions: ["search:read", "papers:read"]
admin:
permissions: ["*"]
researcher:
permissions: ["search:*", "papers:read", "analytics:read"]All API inputs are validated using Go's validator package:
type SearchRequest struct {
Query string `json:"query" validate:"required,min=1,max=500"`
Limit int `json:"limit" validate:"min=1,max=100"`
Offset int `json:"offset" validate:"min=0"`
Provider string `json:"provider" validate:"oneof=arxiv semantic_scholar exa tavily"`
}Multi-layered rate limiting protects against abuse:
security:
rate_limit:
enabled: true
requests: 1000 # requests per window
window: "1m" # time window
burst_size: 50 # burst allowance// Search endpoint: 100 req/min
// Health endpoint: No limit
// Admin endpoints: 10 req/min// Rate limiting middleware
func RateLimitMiddleware(cfg RateLimitConfig) gin.HandlerFunc {
limiter := rate.NewLimiter(rate.Every(time.Minute), cfg.BurstSize)
return gin.HandlerFunc(func(c *gin.Context) {
if !limiter.Allow() {
c.JSON(429, gin.H{"error": "Rate limit exceeded"})
c.Abort()
return
}
c.Next()
})
}// Security middleware adds protective headers
func SecurityHeaders() gin.HandlerFunc {
return func(c *gin.Context) {
c.Header("X-Content-Type-Options", "nosniff")
c.Header("X-Frame-Options", "DENY")
c.Header("X-XSS-Protection", "1; mode=block")
c.Header("Strict-Transport-Security", "max-age=31536000; includeSubDomains")
c.Header("Content-Security-Policy", "default-src 'self'")
c.Next()
}
}security:
cors:
enabled: true
allowed_origins: ["https://scifind.com", "https://app.scifind.com"]
allowed_methods: ["GET", "POST", "PUT", "DELETE", "OPTIONS"]
allowed_headers: ["Authorization", "Content-Type", "X-Request-ID"]
max_age: "12h"
credentials: falseAll communications use TLS 1.3:
server:
tls:
enabled: true
cert_file: "/etc/ssl/certs/scifind.crt"
key_file: "/etc/ssl/private/scifind.key"
min_version: "1.3"
cipher_suites:
- "TLS_AES_128_GCM_SHA256"
- "TLS_AES_256_GCM_SHA384"
- "TLS_CHACHA20_POLY1305_SHA256"# Generate certificate with Let's Encrypt
certbot certonly --standalone -d api.scifind.com
# Auto-renewal
0 0,12 * * * certbot renew --quietDatabase encryption protects stored data:
-- Enable transparent data encryption
ALTER SYSTEM SET ssl = on;
ALTER SYSTEM SET ssl_cert_file = 'server.crt';
ALTER SYSTEM SET ssl_key_file = 'server.key';
-- Column-level encryption for sensitive data
CREATE EXTENSION IF NOT EXISTS pgcrypto;
-- Encrypt sensitive fields
INSERT INTO papers (title, encrypted_content)
VALUES ('Paper Title', pgp_sym_encrypt('sensitive content', 'encryption_key'));# Encrypted database backup
pg_dump scifind | gpg --cipher-algo AES256 --compress-algo 1 --symmetric --output backup.sql.gpg
# Encrypted backup verification
gpg --decrypt backup.sql.gpg | head -10| Data Type | Classification | Protection Level | Retention |
|---|---|---|---|
| API Keys | Secret | Encrypt + Hash | 1 year |
| User Queries | Internal | Log sanitization | 90 days |
| Paper Metadata | Public | Standard encryption | Indefinite |
| System Logs | Internal | Encryption at rest | 1 year |
| Health Data | Internal | Standard encryption | 30 days |
- Collect only necessary data
- Anonymize personal information
- Regular data purging
// Sanitize sensitive data from logs
func sanitizeLogData(data map[string]interface{}) map[string]interface{} {
sensitiveFields := []string{"api_key", "password", "token", "secret"}
for _, field := range sensitiveFields {
if _, exists := data[field]; exists {
data[field] = "[REDACTED]"
}
}
return data
}# Use distroless base image
FROM gcr.io/distroless/static:nonroot
# Run as non-root user
USER nonroot:nonroot
# Copy only necessary files
COPY --from=builder /app/scifind-backend /app/scifind-backend
# Set read-only filesystem
RUN chmod 755 /app/scifind-backend
ENTRYPOINT ["/app/scifind-backend"]# Kubernetes security context
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
securityContext:
runAsNonRoot: true
runAsUser: 65534
fsGroup: 65534
containers:
- name: scifind-backend
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 65534# Scan container for vulnerabilities
trivy image scifind-backend:latest
# Continuous scanning in CI/CD
grype scifind-backend:latest --fail-on medium# Kubernetes network policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: scifind-network-policy
spec:
podSelector:
matchLabels:
app: scifind-backend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: nginx-ingress
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432# Istio security policy
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: scifind-mtls
spec:
selector:
matchLabels:
app: scifind-backend
mtls:
mode: STRICTapiVersion: v1
kind: Secret
metadata:
name: scifind-secrets
type: Opaque
stringData:
api-key: "secure-api-key-here"
db-password: "secure-db-password"
semantic-scholar-key: "ss-api-key"# AWS Secrets Manager
aws secretsmanager create-secret \
--name scifind/api-keys \
--description "SciFind API keys" \
--secret-string '{"semantic_scholar":"key","exa":"key"}'
# HashiCorp Vault
vault kv put secret/scifind \
semantic_scholar_key="your-key" \
exa_key="your-key"Each external provider requires secure API key handling:
type ProviderConfig struct {
APIKey string `json:"api_key,omitempty" yaml:"api_key,omitempty"`
APISecret string `json:"api_secret,omitempty" yaml:"api_secret,omitempty"`
Timeout time.Duration
MaxRetries int
RateLimit RateLimitConfig
}# Store provider keys in environment variables
export SCIFIND_PROVIDERS_SEMANTIC_SCHOLAR_API_KEY="your-secure-key"
export SCIFIND_PROVIDERS_EXA_API_KEY="your-secure-key"
export SCIFIND_PROVIDERS_TAVILY_API_KEY="your-secure-key"func signRequest(req *http.Request, secret string) error {
body, _ := io.ReadAll(req.Body)
timestamp := strconv.FormatInt(time.Now().Unix(), 10)
message := fmt.Sprintf("%s%s%s", timestamp, req.Method, string(body))
mac := hmac.New(sha256.New, []byte(secret))
mac.Write([]byte(message))
signature := hex.EncodeToString(mac.Sum(nil))
req.Header.Set("X-Timestamp", timestamp)
req.Header.Set("X-Signature", signature)
return nil
}// Prevent cascade failures
circuit := &CircuitBreakerConfig{
FailureThreshold: 5,
Timeout: 60 * time.Second,
MaxRequests: 10,
}// Log security events
type SecurityEvent struct {
Type string `json:"type"`
Severity string `json:"severity"`
Source string `json:"source"`
Timestamp time.Time `json:"timestamp"`
Details map[string]interface{} `json:"details"`
RequestID string `json:"request_id"`
}
// Examples of monitored events
const (
EventTypeAuthFailure = "auth_failure"
EventTypeRateLimitHit = "rate_limit_hit"
EventTypeSuspiciousQuery = "suspicious_query"
EventTypeUnauthorizedAccess = "unauthorized_access"
)# Prometheus alerting rules
groups:
- name: security_alerts
rules:
- alert: HighAuthFailureRate
expr: rate(auth_failures_total[5m]) > 10
for: 1m
annotations:
summary: "High authentication failure rate detected"
- alert: RateLimitExceeded
expr: rate_limit_exceeded_total > 100
for: 30s
annotations:
summary: "Rate limit exceeded threshold"
- alert: SuspiciousQuery
expr: suspicious_query_total > 5
for: 1m
annotations:
summary: "Suspicious search queries detected"- Detection: Automated monitoring alerts
- Assessment: Evaluate threat severity and impact
- Containment: Isolate affected systems
- Eradication: Remove threat and vulnerabilities
- Recovery: Restore normal operations
- Lessons Learned: Post-incident review
incident_response:
primary_oncall: "security@scifind.com"
escalation_path:
- "lead-engineer@scifind.com"
- "cto@scifind.com"
external_contacts:
- "security-vendor@company.com"- Identify source IP/API key
- Implement temporary rate limit reduction
- Investigate query patterns
- Block malicious sources
- Monitor for continued abuse
- Immediately isolate affected systems
- Preserve evidence and logs
- Assess data exposure scope
- Notify stakeholders and authorities
- Implement containment measures
-
Input Validation
// Always validate and sanitize inputs func validateSearchQuery(query string) error { if len(query) == 0 { return errors.New("query cannot be empty") } if len(query) > 500 { return errors.New("query too long") } // Check for injection patterns if containsSQLInjection(query) { return errors.New("invalid characters in query") } return nil }
-
Error Handling
// Don't expose internal details func handleError(err error, c *gin.Context) { logger.Error("Internal error", "error", err) c.JSON(500, gin.H{ "error": "Internal server error", "request_id": c.GetString("request_id"), }) }
-
Dependency Management
# Regular dependency updates go mod tidy go list -json -m all | nancy sleuth # Vulnerability scanning govulncheck ./...
# Production security settings
security:
api_keys: ["${SECURE_API_KEY_1}", "${SECURE_API_KEY_2}"]
rate_limit:
enabled: true
requests: 1000
window: "1m"
burst_size: 50
cors:
enabled: true
allowed_origins: ["https://scifind.com"]
allowed_methods: ["GET", "POST"]
credentials: false
headers:
x_frame_options: "DENY"
x_content_type_options: "nosniff"
x_xss_protection: "1; mode=block"
strict_transport_security: "max-age=31536000; includeSubDomains"Daily:
- Monitor security alerts
- Review authentication logs
- Check system resource usage
Weekly:
- Review access logs for anomalies
- Update security documentation
- Test backup and recovery procedures
Monthly:
- Rotate API keys
- Update dependencies
- Conduct security scans
- Review and update firewall rules
Quarterly:
- Security architecture review
- Penetration testing
- Incident response drill
- Security training updates
// Audit log structure
type AuditLog struct {
Timestamp time.Time `json:"timestamp"`
RequestID string `json:"request_id"`
UserID string `json:"user_id,omitempty"`
Action string `json:"action"`
Resource string `json:"resource"`
Success bool `json:"success"`
IP string `json:"ip_address"`
UserAgent string `json:"user_agent"`
Details map[string]interface{} `json:"details,omitempty"`
}
// Log all API access
func auditMiddleware() gin.HandlerFunc {
return func(c *gin.Context) {
start := time.Now()
c.Next()
audit := AuditLog{
Timestamp: start,
RequestID: c.GetString("request_id"),
Action: c.Request.Method,
Resource: c.Request.URL.Path,
Success: c.Writer.Status() < 400,
IP: c.ClientIP(),
UserAgent: c.Request.UserAgent(),
}
auditLogger.Info("API access", "audit", audit)
}
}- Data minimization principles
- Right to deletion implementation
- Privacy by design
- Data protection impact assessments
- Access controls
- System monitoring
- Change management
- Incident response procedures
Application Security:
- Input validation implemented for all endpoints
- SQL injection protection verified
- XSS protection implemented
- CSRF protection for state-changing operations
- Error handling doesn't expose sensitive information
- Dependency vulnerabilities scanned and patched
Authentication & Authorization:
- Strong API key generation process
- API key rotation procedures documented
- Rate limiting configured and tested
- Authorization checks implemented for all endpoints
Data Protection:
- TLS 1.3 enabled for all communications
- Database encryption at rest configured
- Sensitive data properly classified and protected
- Backup encryption implemented
- Log sanitization implemented
Infrastructure Security:
- Container security scanning completed
- Non-root user execution configured
- Network policies implemented
- Secrets management properly configured
- Security contexts applied to all pods
Monitoring & Incident Response:
- Security monitoring implemented
- Alerting rules configured
- Incident response procedures documented
- Emergency contacts updated
- Log retention policies configured
Ongoing Security Tasks:
- Daily security alert monitoring
- Weekly access log review
- Monthly API key rotation
- Quarterly security assessments
- Annual penetration testing
For security issues or questions:
- Security Team: security@scifind.com
- Emergency: +1-XXX-XXX-XXXX
- Bug Bounty: security-reports@scifind.com