feat: add GPU backends, quantization, and search optimizations by cluster2600 · Pull Request #166 · alibaba/zvec

cluster2600 · 2026-02-24T16:41:42Z

Summary

Add C++ Metal GPU backend for vector operations on Apple Silicon
Add FAISS GPU/CPU backends with unified accelerate module
Add Product Quantization (PQ), Optimized PQ (OPQ), and Scalar Quantization
Add pure Python HNSW index with FAISS fallback
Add optimized search functions (ADC, batched search, reranking)
Add Apple Silicon MPS backend via PyTorch

Changes

C++ Metal Backend

src/ailego/gpu/metal/zvec_metal.h — C API header
src/ailego/gpu/metal/zvec_metal.cc — Objective-C++ implementation
src/ailego/gpu/metal/zvec_metal.metal — Metal shaders (L2, IP, cosine, normalize, matmul, top-k)
src/ailego/gpu/metal/CMakeLists.txt — Metal compilation
tests/test_metal.cc — Google Test suite

Python Backends

python/zvec/accelerate.py — Unified accelerator interface
python/zvec/backends/gpu.py — FAISS GPU backend
python/zvec/backends/detect.py — Hardware detection
python/zvec/backends/quantization.py — PQ encoder
python/zvec/backends/opq.py — OPQ encoder + Scalar Quantizer
python/zvec/backends/hnsw.py — Pure Python HNSW with FAISS fallback
python/zvec/backends/search.py — ADC, batch search, reranking
python/zvec/backends/apple_silicon.py — Apple Silicon MPS backend
python/zvec/backends/benchmark.py — Backend performance benchmarks

Configuration

pyproject.toml — accelerate/gpu optional dependencies, per-file-ignores for backends

Docs

docs/METAL_CPP.md — Metal backend documentation

Context

Split from #157. Aligns with cluster2600#2 content.

Test plan

ruff lint and format pass
clang-format passes on all C++ and Metal files
CI builds succeed on all platforms
Metal tests pass on macOS (skip on Linux)

Add Metal Shading Language kernels for GPU-accelerated vector operations on Apple Silicon, including L2 distance, inner product, cosine similarity, vector normalization, matrix multiplication, and top-k selection. Includes C API wrapper, CMakeLists.txt for Metal compilation, and comprehensive Google Test suite. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add unified acceleration module supporting FAISS CPU and GPU backends with automatic hardware detection. Includes backend benchmark suite for performance comparison and realistic dataset benchmarks. New files: - python/zvec/accelerate.py: Unified accelerator interface - python/zvec/backends/gpu.py: FAISS GPU backend - python/zvec/backends/detect.py: Hardware detection - python/zvec/backends/benchmark.py: Performance benchmarks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add Product Quantization (PQ) encoder, Optimized Product Quantization (OPQ) with rotation learning, and Scalar Quantization (8/16-bit) for efficient vector compression and approximate nearest neighbor search. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add pure Python HNSW index with FAISS fallback, optimized search functions (ADC, batched search, reranking), and Apple Silicon MPS backend using PyTorch for GPU-accelerated vector operations on macOS. Update pyproject.toml with accelerate/gpu optional dependencies and per-file-ignores for backends. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add header-only C++ implementations of Product Quantization (PQ) and Optimized Product Quantization (OPQ), plus upgrade the Python OPQ rotation from QR decomposition to SVD-based Orthogonal Procrustes. C++ Product Quantizer (product_quantizer.h): - k-means training with configurable m sub-quantizers and k centroids - encode/decode with distortion measurement - Header-only, depends only on <algorithm>, <cmath>, <vector> C++ OPQ (opq.h): - SVD-based Procrustes rotation: R = V * U^T from SVD(X^T * Y) - Self-contained Jacobi one-sided SVD solver (no LAPACK dependency) - Iterative refinement of rotation + PQ codebooks Python OPQ (_learn_rotation): - Replace simplified QR decomposition with SVD Procrustes - M = X^T @ decoded, U, _, Vt = svd(M), R = Vt.T @ U.T - Produces orthogonal rotations (error ~4e-6) - Benchmarked: ~1-10% reconstruction improvement over plain PQ Follow-up to alibaba#166 ("Future Work: sophisticated OPQ optimization"). Tested on: - macOS: clang++ C++17 compilation + runtime tests - Linux (Blackwell GPU): Python OPQ + cuVS CAGRA integration Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>

Add persistent vector storage backed by RocksDB for GPU pipeline integration, plus documentation for the Metal C++ backend. VectorStorage (vector_storage.h): - RocksDB column families: "vectors", "pq_codes", "metadata" - Batch put/get for raw vectors and PQ codes - load_all() streams vectors into contiguous GPU-ready float buffer - Integrates with existing RocksdbContext wrapper Documentation (docs/METAL_CPP.md): - Architecture overview: RocksDB → load_all() → Metal GPU Buffers - Complete kernel reference table (distance, utility kernels) - Simdgroup optimization dispatch model - C++ PQ/OPQ API examples - RocksDB storage API examples Follow-up to alibaba#166 ("Future Work: Integration with RocksDB storage"). Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>

egolearner · 2026-02-26T06:15:49Z

Hi there, thank you for the effort! We found that this PR covers a few different areas (C++ and Python backends) that might be better handled separately. Would you be open to splitting them up after proper discussion?
Regarding the Metal GPU backend, it seems we might need more integration logic to make it meaningful.

To save you some time and effort, it’s usually best to open an issue, discuss the approach and reach a consensus before you start (vibe) coding. We'd love to hear your thoughts there! Thanks.

cluster2600 · 2026-02-26T08:01:47Z

Hi @egolearner, thanks for the feedback! You're absolutely right — #166 was too broad.

I've already split it into focused, independent PRs:

feat: add simdgroup-optimized Metal kernels #172 — Metal SIMD kernels only (6 simdgroup-optimized compute kernels)
feat: add C++ product quantization and SVD Procrustes OPQ #173 — C++ Product Quantization + OPQ (header-only, no new deps)
feat: add GPU buffer loader for IndexProvider integration #175 — GPU buffer loader for IndexProvider integration
feat: GPU-accelerated indexing with Collection API integration #176 — Python GPU-accelerated indexing (UnifiedGpuIndex + Collection API bridge)

I'll open separate issues for each to discuss the approach before asking for review. Thanks for guiding the process!

cluster2600 and others added 4 commits February 24, 2026 17:35

cluster2600 mentioned this pull request Feb 24, 2026

feat: add Python 3.13 and 3.14 support #157

Closed

3 tasks

ci: retrigger CI (pre-existing hnsw_sparse_searcher flake)

f2c5946

feihongxu0824 requested a review from egolearner February 25, 2026 03:19

feihongxu0824 assigned egolearner Feb 25, 2026

Merge branch 'main' into feat/gpu-quantization-search-optimizations

df2d9ba

cluster2600 mentioned this pull request Feb 25, 2026

feat: add simdgroup-optimized Metal kernels #172

Open

2 tasks

cluster2600 mentioned this pull request Feb 25, 2026

feat: add C++ product quantization and SVD Procrustes OPQ #173

Open

4 tasks

This was referenced Feb 25, 2026

feat: add RocksDB vector storage for GPU pipeline #174

Closed

feat: add GPU buffer loader for IndexProvider integration #175

Open

egolearner closed this Feb 26, 2026

cluster2600 mentioned this pull request Feb 26, 2026

Proposal: simdgroup-optimized Metal compute kernels #177

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add GPU backends, quantization, and search optimizations#166

feat: add GPU backends, quantization, and search optimizations#166
cluster2600 wants to merge 6 commits intoalibaba:mainfrom
cluster2600:feat/gpu-quantization-search-optimizations

cluster2600 commented Feb 24, 2026 •

edited

Loading

Uh oh!

egolearner commented Feb 26, 2026

Uh oh!

cluster2600 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cluster2600 commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

C++ Metal Backend

Python Backends

Configuration

Docs

Context

Test plan

Uh oh!

egolearner commented Feb 26, 2026

Uh oh!

cluster2600 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cluster2600 commented Feb 24, 2026 •

edited

Loading