A high-performance FastAPI service for converting text (especially log messages) into vector embeddings using the Sentence Transformers library.
This service provides a REST API endpoint that transforms text into dense vector representations using the all-MiniLM-L6-v2 model. It's optimized for log processing and analysis workflows where semantic similarity and clustering of text data is needed.
This vectorizer is designed to support log anomaly detection and similarity analysis workflows by:
π Anomaly Detection Pipeline:
- Vectorize logs β Convert log messages into 384-dimensional embeddings
- Store in Elasticsearch β Index vectors using Elasticsearch's dense vector fields
- Detect anomalies β Use vector similarity to identify unusual log patterns
- Find similar logs β Query for logs with similar semantic meaning
π― Key Benefits:
- Semantic Understanding: Goes beyond keyword matching to understand log meaning
- Pattern Recognition: Identifies anomalous behavior even with different wording
- Similarity Search: Find related issues across different log formats
- Scalable Processing: High-throughput vectorization for large log volumes
π Integration with Elasticsearch:
PUT /logs
{
"mappings": {
"properties": {
"message": {"type": "text"},
"vector": {"type": "dense_vector", "dims": 384},
"timestamp": {"type": "date"}
}
}
}- β‘ High Performance: ~78 RPS sustained throughput with ~129ms response times
- π³ Docker Ready: Containerized with Docker Compose
- π Auto-restart: Production-ready with automatic container restart and health checks
- π Consistent Latency: 95% of requests complete within 162ms
- π‘οΈ Reliable: Zero failed requests in extensive load testing
- π₯ Health Monitoring: Built-in
/healthzendpoint for load balancers
- Clone and start the service:
git clone https://github.com/scott-hiemstra/vectorizer.git
cd vectorizer
docker compose up -d- Test the API:
curl -X POST "http://localhost:8000/vectorize" \
-H "Content-Type: application/json" \
-d '{"text": "Database connection timeout - retrying"}'- Install dependencies:
pip install -r requirements.txt- Run the service:
uvicorn main:app --host 0.0.0.0 --port 8000Converts input text into a vector embedding.
Request Body:
{
"text": "Your text to vectorize here"
}Response:
{
"vector": [0.1234, -0.5678, 0.9012, ...]
}Health check endpoint for monitoring and load balancers.
Response (Healthy):
{
"status": "healthy",
"model": "all-MiniLM-L6-v2"
}Response (Unhealthy):
{
"status": "unhealthy",
"reason": "Model not loaded"
}Returns HTTP 503 when unhealthy
Prometheus metrics endpoint for monitoring and observability.
Response: Prometheus-formatted metrics including:
vectorizer_requests_total- Total request count by method, endpoint, and statusvectorizer_request_duration_seconds- Request latency histogramvectorizer_encode_duration_seconds- Time spent in model encodingvectorizer_active_requests- Current number of active requestsvectorizer_model_loaded- Model status (1=loaded, 0=not loaded)vectorizer_text_length_chars- Input text length distribution
Example Usage:
# Error log
curl -X POST "http://localhost:8000/vectorize" \
-H "Content-Type: application/json" \
-d '{"text": "2024-10-10 14:32:15 ERROR Database connection timeout"}'
# Info log
curl -X POST "http://localhost:8000/vectorize" \
-H "Content-Type: application/json" \
-d '{"text": "2024-10-10 14:33:01 INFO User login successful"}'
# Health check
curl http://localhost:8000/healthz
# Prometheus metrics
curl http://localhost:8000/metricsBased on load testing with Apache Bench on Intel i7-1165G7 (4 cores, 8 threads) with 32GB RAM:
| Concurrency | RPS | Avg Response Time | 95th Percentile | Test Size | Notes |
|---|---|---|---|---|---|
| 10 | 94 | 106ms | 124ms | 1,000 req | Optimal for short text |
| 20 | 102 | 195ms | 276ms | 5,000 req | Sustained performance |
| Concurrency | RPS | Avg Response Time | 95th Percentile | Text Length | Notes |
|---|---|---|---|---|---|
| 20 | 32 | 634ms | 930ms | ~532 chars | Lorem paragraph test |
- β‘ Short logs (< 100 chars): ~90+ RPS sustained
- π Medium logs (100-300 chars): ~50-60 RPS estimated
- π Long logs (300+ chars): ~30 RPS sustained
- π― Optimal concurrency: 10-20 depending on text length
Key Insight: Performance scales non-linearly with text length. Text 13x longer results in 6x latency increase, making the service ideal for typical log processing where most messages are short.
Test Environment:
- CPU: Intel i7-1165G7 (Tiger Lake, 4 cores, 8 threads, 2.8-4.7 GHz)
- RAM: 32GB
- Platform: Linux (Docker containerized)
Test the service performance:
# Install Apache Bench
sudo apt-get install apache2-utils
# Run load test
ab -n 1000 -c 10 -p payload.json -T application/json http://localhost:8000/vectorizeSample payload.json:
{"text": "2024-10-10 14:32:15 ERROR Database connection timeout after 30 seconds"}| Variable | Default | Description |
|---|---|---|
PORT |
8000 | Service port |
HOST |
0.0.0.0 | Service host |
The service is configured to:
- Run on port 8000 (mapped from host)
- Auto-restart on failure
- Use minimal resource footprint
To modify resource limits, uncomment the deploy section in docker-compose.yml:
deploy:
resources:
limits:
cpus: '2'
memory: 8G- Model:
all-MiniLM-L6-v2 - Embedding Dimension: 384
- Max Sequence Length: 256 tokens
- Performance: Optimized for speed while maintaining quality
Perfect for:
- Elasticsearch Integration: Store vectors in dense_vector fields for fast similarity search
- Pattern Detection: Identify unusual log patterns that deviate from normal behavior
- Incident Response: Quickly find logs similar to known issues
- Baseline Establishment: Create vector baselines for normal system behavior
- Semantic Clustering: Group logs by meaning, not just keywords
- Cross-System Correlation: Find related issues across different applications
- Error Classification: Automatically categorize errors by semantic similarity
- Troubleshooting: Search for logs with similar context or meaning
- Feature Extraction: Use vectors as input for downstream ML models
- Time-Series Analysis: Track semantic drift in log patterns over time
- Root Cause Analysis: Identify common patterns in incident logs
# Install dependencies
pip install -r requirements.txt
# Run with auto-reload
uvicorn main:app --reload --host 0.0.0.0 --port 8000# Basic functionality test
python -c "
import requests
response = requests.post('http://localhost:8000/vectorize',
json={'text': 'test message'})
print(f'Status: {response.status_code}')
print(f'Vector length: {len(response.json()[\"vector\"])}')
"For higher throughput:
- Multiple Workers:
uvicorn main:app --workers 4 --host 0.0.0.0 --port 8000-
Load Balancer: Use nginx or similar for distributing requests
-
GPU Acceleration: Modify to use CUDA if GPUs available
Built-in observability features:
- β
Health check endpoint (
/healthz) - Ready for load balancers - β
Prometheus metrics (
/metrics) - Comprehensive performance monitoring - β Request logging - Structured logging with uvicorn
Key Metrics to Monitor:
vectorizer_requests_total- Request volume and error ratesvectorizer_request_duration_seconds- API latency percentilesvectorizer_encode_duration_seconds- Model performancevectorizer_active_requests- Concurrent loadvectorizer_text_length_chars- Input size distribution
Sample Prometheus Query:
# 95th percentile latency
histogram_quantile(0.95, rate(vectorizer_request_duration_seconds_bucket[5m]))
# Error rate
rate(vectorizer_requests_total{status!="200"}[5m]) / rate(vectorizer_requests_total[5m])
# Requests per second
rate(vectorizer_requests_total[5m])
Grafana Dashboard: Monitor throughput, latency, error rates, and model performance in real-time.
- Error tracking
Model loading fails:
- Ensure sufficient memory (>2GB recommended)
- Check internet connectivity for initial model download
Performance degradation:
- Monitor CPU usage
- Consider reducing concurrency
- Check for memory leaks in long-running deployments
Container startup issues:
- Verify port 8000 is available
- Check Docker daemon is running
- Review container logs:
docker compose logs vectorizer
This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request