AWS Lambda container image for generating embeddings using fastembed-rs. Supports text embeddings, image embeddings, sparse embeddings, and reranking models. Models are pre-loaded into Docker images for fast cold starts.
The following text embedding models have prebuilt Docker images available on Docker Hub. You can pull and use them directly:
The following models are supported by fastembed-rs and can be built using the Building Your Own Image instructions below. Prebuilt images are not yet available for these models.
| Model | Model ID | Dimension | Description | Docker |
|---|---|---|---|---|
| Mxbai-Embed-Large-v1 | mixedbread-ai/mxbai-embed-large-v1 |
1024 | Quantized Large English embedding model from MixedBreed.ai | johnnywalee/serverless-vectorizer:latest-mixedbread-ai/mxbai-embed-large-v1 |
| GTE-Base-EN-v1.5 | Alibaba-NLP/gte-base-en-v1.5 |
768 | Quantized Large multilingual embedding model from Alibaba | johnnywalee/serverless-vectorizer:latest-Alibaba-NLP/gte-base-en-v1.5 |
| Snowflake-Arctic-Embed-Xs | snowflake/snowflake-arctic-embed-xs |
384 | Quantized Snowflake Arctic embed model, xs | johnnywalee/serverless-vectorizer:latest-snowflake/snowflake-arctic-embed-xs |
| Paraphrase-Multilingual-MINILM-L12-v2 | Xenova/paraphrase-multilingual-MiniLM-L12-v2 |
384 | Multi-lingual model | johnnywalee/serverless-vectorizer:latest-Xenova/paraphrase-multilingual-MiniLM-L12-v2 |
| BGE-Large-EN-v1.5 | Xenova/bge-large-en-v1.5 |
1024 | v1.5 release of the large English model | johnnywalee/serverless-vectorizer:latest-Xenova/bge-large-en-v1.5 |
| Paraphrase-Multilingual-MPNET-Base-v2 | Xenova/paraphrase-multilingual-mpnet-base-v2 |
768 | Sentence-transformers model for tasks like clustering or semantic search | johnnywalee/serverless-vectorizer:latest-Xenova/paraphrase-multilingual-mpnet-base-v2 |
| Multilingual-E5-Small | intfloat/multilingual-e5-small |
384 | Small model of multilingual E5 Text Embeddings | johnnywalee/serverless-vectorizer:latest-intfloat/multilingual-e5-small |
| Nomic-Embed-Text-v1.5 | nomic-ai/nomic-embed-text-v1.5 |
768 | v1.5 release of the 8192 context length english model | johnnywalee/serverless-vectorizer:latest-nomic-ai/nomic-embed-text-v1.5 |
| BGE-Small-EN-v1.5 | Xenova/bge-small-en-v1.5 |
384 | v1.5 release of the fast and default English model | johnnywalee/serverless-vectorizer:latest-Xenova/bge-small-en-v1.5 |
| Snowflake-Arctic-Embed-M | Snowflake/snowflake-arctic-embed-m |
768 | Quantized Snowflake Arctic embed model, medium | johnnywalee/serverless-vectorizer:latest-Snowflake/snowflake-arctic-embed-m |
| BGE-Small-ZH-v1.5 | Xenova/bge-small-zh-v1.5 |
512 | v1.5 release of the small Chinese model | johnnywalee/serverless-vectorizer:latest-Xenova/bge-small-zh-v1.5 |
| Snowflake-Arctic-Embed-M-Long | snowflake/snowflake-arctic-embed-m-long |
768 | Quantized Snowflake Arctic embed model, medium with 2048 context | johnnywalee/serverless-vectorizer:latest-snowflake/snowflake-arctic-embed-m-long |
| All-MINILM-L6-v2 | Xenova/all-MiniLM-L6-v2 |
384 | Quantized Sentence Transformer model, MiniLM-L6-v2 | johnnywalee/serverless-vectorizer:latest-Xenova/all-MiniLM-L6-v2 |
| Multilingual-E5-Large-Onnx | Qdrant/multilingual-e5-large-onnx |
1024 | Large model of multilingual E5 Text Embeddings | johnnywalee/serverless-vectorizer:latest-Qdrant/multilingual-e5-large-onnx |
| Mxbai-Embed-Large-v1 | mixedbread-ai/mxbai-embed-large-v1 |
1024 | Large English embedding model from MixedBreed.ai | johnnywalee/serverless-vectorizer:latest-mixedbread-ai/mxbai-embed-large-v1 |
| GTE-Large-EN-v1.5 | Alibaba-NLP/gte-large-en-v1.5 |
1024 | Quantized Large multilingual embedding model from Alibaba | johnnywalee/serverless-vectorizer:latest-Alibaba-NLP/gte-large-en-v1.5 |
| GTE-Large-EN-v1.5 | Alibaba-NLP/gte-large-en-v1.5 |
1024 | Large multilingual embedding model from Alibaba | johnnywalee/serverless-vectorizer:latest-Alibaba-NLP/gte-large-en-v1.5 |
| Modernbert-Embed-Large | lightonai/modernbert-embed-large |
1024 | Large model of ModernBert Text Embeddings | johnnywalee/serverless-vectorizer:latest-lightonai/modernbert-embed-large |
| Clip-ViT-B-32-Text | Qdrant/clip-ViT-B-32-text |
512 | CLIP text encoder based on ViT-B/32 | johnnywalee/serverless-vectorizer:latest-Qdrant/clip-ViT-B-32-text |
| All-MINILM-L12-v2 | Xenova/all-MiniLM-L12-v2 |
384 | Quantized Sentence Transformer model, MiniLM-L12-v2 | johnnywalee/serverless-vectorizer:latest-Xenova/all-MiniLM-L12-v2 |
| BGE-Base-EN-v1.5 | Xenova/bge-base-en-v1.5 |
768 | v1.5 release of the base English model | johnnywalee/serverless-vectorizer:latest-Xenova/bge-base-en-v1.5 |
| BGE-Base-EN-v1.5-Onnx-Q | Qdrant/bge-base-en-v1.5-onnx-Q |
768 | Quantized v1.5 release of the large English model | johnnywalee/serverless-vectorizer:latest-Qdrant/bge-base-en-v1.5-onnx-Q |
| Nomic-Embed-Text-v1.5 | nomic-ai/nomic-embed-text-v1.5 |
768 | Quantized v1.5 release of the 8192 context length english model | johnnywalee/serverless-vectorizer:latest-nomic-ai/nomic-embed-text-v1.5 |
| Snowflake-Arctic-Embed-S | snowflake/snowflake-arctic-embed-s |
384 | Snowflake Arctic embed model, small | johnnywalee/serverless-vectorizer:latest-snowflake/snowflake-arctic-embed-s |
| All-MINILM-L6-v2-Onnx | Qdrant/all-MiniLM-L6-v2-onnx |
384 | Sentence Transformer model, MiniLM-L6-v2 | johnnywalee/serverless-vectorizer:latest-Qdrant/all-MiniLM-L6-v2-onnx |
| BGE-Small-EN-v1.5-Onnx-Q | Qdrant/bge-small-en-v1.5-onnx-Q |
384 | Quantized v1.5 release of the fast and default English model | johnnywalee/serverless-vectorizer:latest-Qdrant/bge-small-en-v1.5-onnx-Q |
| Paraphrase-Multilingual-MINILM-L12-v2-Onnx-Q | Qdrant/paraphrase-multilingual-MiniLM-L12-v2-onnx-Q |
384 | Quantized Multi-lingual model | johnnywalee/serverless-vectorizer:latest-Qdrant/paraphrase-multilingual-MiniLM-L12-v2-onnx-Q |
| Snowflake-Arctic-Embed-S | snowflake/snowflake-arctic-embed-s |
384 | Quantized Snowflake Arctic embed model, small | johnnywalee/serverless-vectorizer:latest-snowflake/snowflake-arctic-embed-s |
| Snowflake-Arctic-Embed-M | Snowflake/snowflake-arctic-embed-m |
768 | Snowflake Arctic embed model, medium | johnnywalee/serverless-vectorizer:latest-Snowflake/snowflake-arctic-embed-m |
| All-MPNET-Base-v2 | Xenova/all-mpnet-base-v2 |
768 | Sentence Transformer model, mpnet-base-v2 | johnnywalee/serverless-vectorizer:latest-Xenova/all-mpnet-base-v2 |
| Snowflake-Arctic-Embed-M-Long | snowflake/snowflake-arctic-embed-m-long |
768 | Snowflake Arctic embed model, medium with 2048 context | johnnywalee/serverless-vectorizer:latest-snowflake/snowflake-arctic-embed-m-long |
| BGE-Large-ZH-v1.5 | Xenova/bge-large-zh-v1.5 |
1024 | v1.5 release of the large Chinese model | johnnywalee/serverless-vectorizer:latest-Xenova/bge-large-zh-v1.5 |
| Embeddinggemma-300m-ONNX | onnx-community/embeddinggemma-300m-ONNX |
768 | EmbeddingGemma is a 300M parameter from Google | johnnywalee/serverless-vectorizer:latest-onnx-community/embeddinggemma-300m-ONNX |
| Snowflake-Arctic-Embed-Xs | snowflake/snowflake-arctic-embed-xs |
384 | Snowflake Arctic embed model, xs | johnnywalee/serverless-vectorizer:latest-snowflake/snowflake-arctic-embed-xs |
| Snowflake-Arctic-Embed-L | snowflake/snowflake-arctic-embed-l |
1024 | Snowflake Arctic embed model, large | johnnywalee/serverless-vectorizer:latest-snowflake/snowflake-arctic-embed-l |
| JINA-Embeddings-v2-Base-Code | jinaai/jina-embeddings-v2-base-code |
768 | Jina embeddings v2 base code | johnnywalee/serverless-vectorizer:latest-jinaai/jina-embeddings-v2-base-code |
| Multilingual-E5-Base | intfloat/multilingual-e5-base |
768 | Base model of multilingual E5 Text Embeddings | johnnywalee/serverless-vectorizer:latest-intfloat/multilingual-e5-base |
| GTE-Base-EN-v1.5 | Alibaba-NLP/gte-base-en-v1.5 |
768 | Large multilingual embedding model from Alibaba | johnnywalee/serverless-vectorizer:latest-Alibaba-NLP/gte-base-en-v1.5 |
| All-MINILM-L12-v2 | Xenova/all-MiniLM-L12-v2 |
384 | Sentence Transformer model, MiniLM-L12-v2 | johnnywalee/serverless-vectorizer:latest-Xenova/all-MiniLM-L12-v2 |
| Snowflake-Arctic-Embed-L | snowflake/snowflake-arctic-embed-l |
1024 | Quantized Snowflake Arctic embed model, large | johnnywalee/serverless-vectorizer:latest-snowflake/snowflake-arctic-embed-l |
| Nomic-Embed-Text-v1 | nomic-ai/nomic-embed-text-v1 |
768 | 8192 context length english model | johnnywalee/serverless-vectorizer:latest-nomic-ai/nomic-embed-text-v1 |
| BGE-M3 | BAAI/bge-m3 |
1024 | Multilingual M3 model with 8192 context length, supports 100+ languages | johnnywalee/serverless-vectorizer:latest-BAAI/bge-m3 |
| BGE-Large-EN-v1.5-Onnx-Q | Qdrant/bge-large-en-v1.5-onnx-Q |
1024 | Quantized v1.5 release of the large English model | johnnywalee/serverless-vectorizer:latest-Qdrant/bge-large-en-v1.5-onnx-Q |
| Model | Model ID | Dimension | Description | Docker |
|---|---|---|---|---|
| Clip-ViT-B-32-Vision | Qdrant/clip-ViT-B-32-vision |
512 | CLIP vision encoder based on ViT-B/32 | johnnywalee/serverless-vectorizer:latest-Qdrant/clip-ViT-B-32-vision |
| Resnet50-Onnx | Qdrant/resnet50-onnx |
2048 | ResNet-50 from Deep Residual Learning for Image Recognition <https://arxiv.org/abs/1512.03385>__. |
johnnywalee/serverless-vectorizer:latest-Qdrant/resnet50-onnx |
| Unicom-ViT-B-16 | Qdrant/Unicom-ViT-B-16 |
768 | Unicom Unicom-ViT-B-16 from open-metric-learning | johnnywalee/serverless-vectorizer:latest-Qdrant/Unicom-ViT-B-16 |
| Unicom-ViT-B-32 | Qdrant/Unicom-ViT-B-32 |
512 | Unicom Unicom-ViT-B-32 from open-metric-learning | johnnywalee/serverless-vectorizer:latest-Qdrant/Unicom-ViT-B-32 |
| Nomic-Embed-Vision-v1.5 | nomic-ai/nomic-embed-vision-v1.5 |
768 | Nomic NomicEmbedVisionV15 | johnnywalee/serverless-vectorizer:latest-nomic-ai/nomic-embed-vision-v1.5 |
| Model | Model ID | Dimension | Description | Docker |
|---|---|---|---|---|
| Splade_PP_en_v1 | Qdrant/Splade_PP_en_v1 |
- | Splade sparse vector model for commercial use, v1 | johnnywalee/serverless-vectorizer:latest-Qdrant/Splade_PP_en_v1 |
| BGE-M3 | BAAI/bge-m3 |
- | BGE-M3 sparse embedding model with 8192 context, supports 100+ languages | johnnywalee/serverless-vectorizer:latest-BAAI/bge-m3 |
| Model | Model ID | Dimension | Description | Docker |
|---|---|---|---|---|
| BGE-Reranker-Base | BAAI/bge-reranker-base |
- | reranker model for English and Chinese | johnnywalee/serverless-vectorizer:latest-BAAI/bge-reranker-base |
| BGE-Reranker-v2-M3 | rozgo/bge-reranker-v2-m3 |
- | reranker model for multilingual | johnnywalee/serverless-vectorizer:latest-rozgo/bge-reranker-v2-m3 |
| JINA-Reranker-v1-Turbo-EN | jinaai/jina-reranker-v1-turbo-en |
- | reranker model for English | johnnywalee/serverless-vectorizer:latest-jinaai/jina-reranker-v1-turbo-en |
| JINA-Reranker-v2-Base-Multilingual | jinaai/jina-reranker-v2-base-multilingual |
- | reranker model for multilingual | johnnywalee/serverless-vectorizer:latest-jinaai/jina-reranker-v2-base-multilingual |
The build process uses a two-stage approach:
- Base image - Contains the Lambda runtime and embedding binaries
- Variant image - Extends the base image with a pre-loaded model for fast cold starts
Use the pre-built base image from Docker Hub to skip the base build step:
# Build a model variant using the pre-built base image
docker build \
--build-arg BASE_IMAGE=johnnywalee/serverless-vectorizer:base-latest \
--build-arg MODEL_ID=Xenova/all-MiniLM-L12-v2 \
-f Dockerfile.variant \
-t my-vectorizer:minilm .docker build -t serverless-vectorizer:base .Use Dockerfile.variant with the following build arguments:
BASE_IMAGE- The base image to extend (your local build orjohnnywalee/serverless-vectorizer:base-latest)MODEL_ID- The model ID from the Supported Models table above
docker build \
--build-arg BASE_IMAGE=serverless-vectorizer:base \
--build-arg MODEL_ID=<MODEL_ID> \
-f Dockerfile.variant \
-t serverless-vectorizer:<your-tag> .# BGE-Small (384 dimensions, English)
docker build \
--build-arg BASE_IMAGE=johnnywalee/serverless-vectorizer:base-latest \
--build-arg MODEL_ID=Xenova/bge-small-en-v1.5 \
-f Dockerfile.variant \
-t serverless-vectorizer:bge-small .
# BGE-M3 (1024 dimensions, Multilingual)
docker build \
--build-arg BASE_IMAGE=johnnywalee/serverless-vectorizer:base-latest \
--build-arg MODEL_ID=BAAI/bge-m3 \
-f Dockerfile.variant \
-t serverless-vectorizer:bge-m3 .
# Snowflake Arctic Embed Large (1024 dimensions)
docker build \
--build-arg BASE_IMAGE=johnnywalee/serverless-vectorizer:base-latest \
--build-arg MODEL_ID=snowflake/snowflake-arctic-embed-l \
-f Dockerfile.variant \
-t serverless-vectorizer:arctic-l .
# Multilingual E5 Large (1024 dimensions)
docker build \
--build-arg BASE_IMAGE=johnnywalee/serverless-vectorizer:base-latest \
--build-arg MODEL_ID=Qdrant/multilingual-e5-large-onnx \
-f Dockerfile.variant \
-t serverless-vectorizer:e5-large .
# All-MiniLM (384 dimensions, lightweight)
docker build \
--build-arg BASE_IMAGE=johnnywalee/serverless-vectorizer:base-latest \
--build-arg MODEL_ID=Xenova/all-MiniLM-L6-v2 \
-f Dockerfile.variant \
-t serverless-vectorizer:minilm .Use the included CLI tool to list all supported models:
# Build and run the list-models tool
cargo run --bin list-models
# Output as markdown table
cargo run --bin list-models -- -f markdown
# Output as JSON
cargo run --bin list-models -- -f json
# List all model categories (text, image, sparse, rerank)
cargo run --bin list-models -- -c allThe Lambda automatically detects the model type from the MODEL_ID environment variable and routes requests
accordingly. Each model type has its own request/response format.
| MODEL_ID Pattern | Model Type | Use Case |
|---|---|---|
| Text embedding models | text |
Semantic search, similarity |
Qdrant/clip-ViT-B-32-vision, etc. |
image |
Image similarity, visual search |
Qdrant/Splade_PP_en_v1, etc. |
sparse |
Hybrid search, keyword matching |
BAAI/bge-reranker-*, etc. |
rerank |
Re-ranking search results |
Generate dense vector embeddings for text. Default model type.
{
"messages": [
"Hello world",
"How are you?"
]
}Or read from S3:
{
"s3_file": "my-bucket/path/to/texts.json"
}{
"embeddings": [
[
0.123,
0.456,
-0.789,
...
],
[
0.321,
0.654,
-0.987,
...
]
],
"dimension": 384,
"model_type": "text",
"count": 2
}Direct Lambda Invocation:
aws lambda invoke \
--function-name serverless-vectorizer \
--payload '{"messages": ["Hello world", "How are you?"]}' \
response.jsonLocal Docker Testing:
# Start the container
docker run -p 9000:8080 johnnywalee/serverless-vectorizer:latest-Xenova/bge-small-en-v1.5
# Send request
curl -X POST "http://localhost:9000/2015-03-31/functions/function/invocations" \
-H "Content-Type: application/json" \
-d '{"messages": ["Hello world", "How are you?"]}'API Gateway:
curl -X POST https://your-api.execute-api.region.amazonaws.com/embed \
-H "Content-Type: application/json" \
-d '{"messages": ["Hello world", "How are you?"]}'Generate dense vector embeddings for images. Requires an image embedding model (e.g., Qdrant/clip-ViT-B-32-vision).
Images can be provided as base64-encoded data or S3 paths:
{
"images": [
{
"base64": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJ..."
},
{
"s3_path": "my-bucket/images/photo.jpg"
}
]
}Or using s3_images for multiple S3 paths:
{
"s3_images": [
"my-bucket/images/photo1.jpg",
"my-bucket/images/photo2.png"
]
}{
"embeddings": [
[
0.123,
0.456,
-0.789,
...
],
[
0.321,
0.654,
-0.987,
...
]
],
"dimension": 512,
"model_type": "image",
"count": 2
}Build Image Embedding Container:
docker build \
--build-arg BASE_IMAGE=johnnywalee/serverless-vectorizer:base-latest \
--build-arg MODEL_ID=Qdrant/clip-ViT-B-32-vision \
-f Dockerfile.variant \
-t serverless-vectorizer:clip .Local Docker Testing:
# Start the container
docker run -p 9000:8080 serverless-vectorizer:clip
# Send request with base64 image
curl -X POST "http://localhost:9000/2015-03-31/functions/function/invocations" \
-H "Content-Type: application/json" \
-d '{
"images": [
{"base64": "'"$(base64 -w 0 image.png)"'"}
]
}'Lambda with S3 Images:
aws lambda invoke \
--function-name serverless-vectorizer-image \
--payload '{
"s3_images": ["my-bucket/images/photo1.jpg", "my-bucket/images/photo2.jpg"]
}' \
response.jsonGenerate sparse vector embeddings for text using SPLADE models. Useful for hybrid search combining dense and sparse vectors.
{
"messages": [
"The quick brown fox jumps over the lazy dog"
]
}{
"sparse_embeddings": [
{
"indices": [
102,
456,
789,
1234,
5678
],
"values": [
0.5,
0.3,
0.8,
0.2,
0.9
]
}
],
"model_type": "sparse",
"count": 1
}The sparse embedding contains:
indices: Token indices with non-zero weightsvalues: Corresponding weights for each token
Build Sparse Embedding Container:
docker build \
--build-arg BASE_IMAGE=johnnywalee/serverless-vectorizer:base-latest \
--build-arg MODEL_ID=Qdrant/Splade_PP_en_v1 \
-f Dockerfile.variant \
-t serverless-vectorizer:splade .Local Docker Testing:
# Start the container
docker run -p 9000:8080 serverless-vectorizer:splade
# Send request
curl -X POST "http://localhost:9000/2015-03-31/functions/function/invocations" \
-H "Content-Type: application/json" \
-d '{"messages": ["Machine learning is a subset of artificial intelligence"]}'Lambda Invocation:
aws lambda invoke \
--function-name serverless-vectorizer-sparse \
--payload '{"messages": ["Machine learning is a subset of artificial intelligence"]}' \
response.jsonRe-rank documents based on relevance to a query. Useful for improving search results.
{
"query": "What is machine learning?",
"documents": [
"Machine learning is a subset of AI that enables computers to learn from data.",
"The weather today is sunny and warm.",
"Deep learning uses neural networks with many layers."
],
"top_k": 2,
"return_documents": true
}Parameters:
query: The search querydocuments: Array of documents to ranktop_k(optional): Return only top K resultsreturn_documents(optional, default: true): Include document text in response
{
"rankings": [
{
"index": 0,
"score": 0.95,
"document": "Machine learning is a subset of AI that enables computers to learn from data."
},
{
"index": 2,
"score": 0.82,
"document": "Deep learning uses neural networks with many layers."
}
],
"model_type": "rerank",
"count": 2
}Results are sorted by score in descending order.
Build Reranking Container:
docker build \
--build-arg BASE_IMAGE=johnnywalee/serverless-vectorizer:base-latest \
--build-arg MODEL_ID=BAAI/bge-reranker-base \
-f Dockerfile.variant \
-t serverless-vectorizer:reranker .Local Docker Testing:
# Start the container
docker run -p 9000:8080 serverless-vectorizer:reranker
# Send request
curl -X POST "http://localhost:9000/2015-03-31/functions/function/invocations" \
-H "Content-Type: application/json" \
-d '{
"query": "What is machine learning?",
"documents": [
"Machine learning is a subset of AI that enables computers to learn from data.",
"The weather today is sunny and warm.",
"Deep learning uses neural networks with many layers."
],
"top_k": 2
}'Lambda Invocation:
aws lambda invoke \
--function-name serverless-vectorizer-rerank \
--payload '{
"query": "Capital cities of Europe",
"documents": [
"Paris is the capital of France.",
"Pizza is a popular Italian food.",
"London is the capital of England.",
"Berlin is the capital of Germany."
],
"top_k": 3
}' \
response.jsonAll model types support reading from and writing to S3.
Text Embeddings:
aws lambda invoke \
--function-name serverless-vectorizer \
--payload '{"s3_file": "my-bucket/path/to/texts.json"}' \
response.jsonThe S3 file can contain:
- Plain text (embedded as single document)
- JSON array of strings (each string embedded separately)
Image Embeddings:
aws lambda invoke \
--function-name serverless-vectorizer-image \
--payload '{"s3_images": ["my-bucket/images/photo1.jpg", "my-bucket/images/photo2.png"]}' \
response.jsonSave embeddings directly to S3 (text and image models only):
aws lambda invoke \
--function-name serverless-vectorizer \
--payload '{
"messages": ["Hello world"],
"save_to_s3": {
"bucket": "my-output-bucket",
"key": "embeddings/output.json"
}
}' \
response.jsonResponse includes S3 location:
{
"embeddings": [
[
0.123,
0.456,
...
]
],
"dimension": 384,
"model_type": "text",
"count": 1,
"s3_location": "s3://my-output-bucket/embeddings/output.json"
}Read from S3 and save results to S3:
aws lambda invoke \
--function-name serverless-vectorizer \
--payload '{
"s3_file": "input-bucket/documents.json",
"save_to_s3": {
"bucket": "output-bucket",
"key": "embeddings/result.json"
}
}' \
response.json{
// === Text Embedding Input ===
"messages": [
"text1",
"text2"
],
// Direct text input
"s3_file": "bucket/key",
// OR read text from S3
// === Image Embedding Input ===
"images": [
// Image input array
{
"base64": "..."
},
// Base64 encoded image
{
"s3_path": "bucket/key"
}
// OR S3 path to image
],
"s3_images": [
"bucket/key1",
"bucket/key2"
],
// OR S3 paths array
// === Reranking Input ===
"query": "search query",
// Query for reranking
"documents": [
"doc1",
"doc2"
],
// Documents to rank
"top_k": 5,
// Return top K results (optional)
"return_documents": true,
// Include docs in response (optional)
// === Output Options ===
"save_to_s3": {
// Save results to S3 (optional)
"bucket": "bucket-name",
"key": "path/to/output.json"
}
}Text/Image Embeddings:
{
"embeddings": [
[
...
],
[
...
]
],
// Dense embedding vectors
"dimension": 384,
// Vector dimension
"model_type": "text",
// "text" or "image"
"count": 2,
// Number of embeddings
"s3_location": "s3://..."
// If save_to_s3 was used
}Sparse Embeddings:
{
"sparse_embeddings": [
{
"indices": [
...
],
"values": [
...
]
}
],
"model_type": "sparse",
"count": 1
}Reranking:
{
"rankings": [
{
"index": 0,
"score": 0.95,
"document": "..."
}
],
"model_type": "rerank",
"count": 2
}- Rust 1.70+
- Docker & Docker Compose
- AWS CLI (for local testing)
# Run unit tests
cargo test --test unit_tests --features aws
# Start LocalStack for integration tests
docker-compose up -d
# Run all integration tests (can run in parallel - each has its own MODEL_ID)
cargo test --features aws --test integration_text_tests &
cargo test --features aws --test integration_image_tests &
cargo test --features aws --test integration_sparse_tests &
cargo test --features aws --test integration_rerank_tests &
wait
# Or run specific model type tests
cargo test --features aws --test integration_text_tests # Text embeddings
cargo test --features aws --test integration_image_tests # Image embeddings
cargo test --features aws --test integration_sparse_tests # Sparse embeddings
cargo test --features aws --test integration_rerank_tests # Reranking
# Stop LocalStack
docker-compose down.
├── src/
│ ├── main.rs # Lambda entry point
│ ├── lambda.rs # Lambda handler and AWS integration
│ ├── lib.rs # Library exports
│ └── core/
│ ├── model.rs # Model definitions and registry
│ ├── embeddings.rs # Embedding services (text, image, sparse, rerank)
│ ├── image_utils.rs # Image loading utilities
│ └── ...
├── tests/
│ ├── unit_tests.rs # Unit tests
│ ├── integration_text_tests.rs # Text embedding integration tests
│ ├── integration_image_tests.rs # Image embedding integration tests
│ ├── integration_sparse_tests.rs # Sparse embedding integration tests
│ └── integration_rerank_tests.rs # Reranking integration tests
├── Dockerfile # Base image
├── Dockerfile.variant # Model-specific variant builder
└── docker-compose.yaml # LocalStack for testing
Push to ECR and create Lambda function:
# Tag and push
aws ecr get-login-password --region us-east-1 | docker login --name AWS --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com
docker tag serverless-vectorizer:bge-small 123456789.dkr.ecr.us-east-1.amazonaws.com/serverless-vectorizer:bge-small
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/serverless-vectorizer:bge-small
# Create Lambda (first time)
aws lambda create-function \
--function-name serverless-vectorizer \
--package-type Image \
--code ImageUri=123456789.dkr.ecr.us-east-1.amazonaws.com/serverless-vectorizer:bge-small \
--role arn:aws:iam::123456789:role/lambda-execution-role \
--memory-size 1024 \
--timeout 30
# Update Lambda (subsequent deploys)
aws lambda update-function-code \
--function-name serverless-vectorizer \
--image-uri 123456789.dkr.ecr.us-east-1.amazonaws.com/serverless-vectorizer:bge-smallThis project is powered by fastembed-rs, a Rust library for fast, lightweight embedding generation. fastembed-rs supports:
- Text Embeddings - Dense vector representations for semantic search and similarity
- Image Embeddings - Vision encoders like CLIP and ResNet for image similarity
- Sparse Text Embeddings - SPLADE models for hybrid search
- Reranking Models - Cross-encoder models for result reranking
MIT