[ATC'25 Artifact] PIM-ANNS: Turbocharge ANNS on Real Processing-in-Memory by Enabling Fine-Grained Per-PIM-Core Scheduling

Welcome to the artifact repository of ATC'25 accepted paper: PIM-ANNS: Turbocharge ANNS on Real Processing-in-Memory by Enabling Fine-Grained Per-PIM-Core Scheduling!

Should there be any questions, please contact the authors in HotCRP. The authors will respond to each question within 24hrs and as soon as possible.

Main Claims

Major Claim 1: PIM-ANNS outperforms baselines in terms of query throughput and latency. (Exp #1,#2,#7,#8)

Major Claim 2: PIM-ANNS overcomes batch scheduling limitations via fine-grained per-PU scheduling. (Exp #3,#6a)

Major Claim 3: PIM-ANNS’s techniques synergistically improve performance.(Exp #4,#5,#6b)

Overview

Directory structure

The major part of source codes are well documented, facilitating further research.

common/              # shared files for both DPU and host programs
dpu/                 # DPU kernel functions
host/                # host programs interacting with DPU kernels
third-party/
   ├── upmem-2024.2.0-Linux-x86_64/  # modified UPMEM-SDK
   └── faiss_upmem/                  # modified FAISS
AE/                  # test scripts and plotting scripts
main.cpp             # main program entry point

Overview of PIM-ANNS

This figure shows the workflow of an ANNS query in PIM-ANNS. It supposes PU_0 is overloaded. Our per-PU query dispatching will dispatch this query to the replica on PU_1.

Environment Setup

To artifact reviewers: please skip this section and go to Evaluate the Artifact. This is because we have already set up the required environment on the provided platform.

Prerequisites

Hardware Requirements: To run this project, UPMEM hardware is required. https://www.upmem.com/

Software Requirements: Additionally, we also need setup the following software environment.

UPMEM-SDK: This project uses a modified version of UPMEM-SDK based on the 2024.2 release. Original SDK: http://sdk-releases.upmem.com/2024.2.0/ubuntu_22.04/upmem-2024.2.0-Linux-x86_64.tar.gz. Installation script (Note: Modify the installation path in install.sh to your preferred location):

cd third-party/upmem-2024.2.0-Linux-x86_64
cd src/backends
bash ./install.sh

FAISS: The IVFPQ index algorithm reuses portions of the FAISS codebase. For better UPMEM compatibility, we provide a modified version of FAISS based on: https://github.com/facebookresearch/faiss. Installation method is the same as FAISS.

cd third-party/faiss_upmem
cmake -B build .
make -C build -j faiss
make -C build install

Boost Coroutine: This project utilizes Boost's coroutine library. Thus, please install the libboost with the following commands.

sudo apt-get update
sudo apt install libboost-all-dev

Evaluate the Artifact

Login to the server we provide

We have provided a pre-configured server for AE reviewers. The project directory is located at workspace/PIM-ANNS.

ssh -p 12853 wupuqing@44be1613b0de8118.natapp.cc

cd workspace/PIM-ANNS

Building PIM-ANNS from source

Please simply use the CMake building system.

cmake -B build .
cd build
make -j

Hello-world example

To verify that everything is prepared, you can run a hello-world example that verifies PIM-ANNS's functionality, please run the following command:

AE/hello_world.sh

It will run for approximately 1 minute and, on success, output something like below:

yyyy-mm-dd hh:mm:ss
json_path: /home/wupuqing/workspace/PIM-ANNS/config.json
query path is /mnt/optane/wpq/dataset/space/query10K.i8bin
query_num: 10000, dim: 100
searching SPACE1M, nprobe = 11
The command ./main 11 completed successfully.

If you can see this output, then everything is OK, and you can start running the artifact.

Run all experiments

We provide convenient scripts for running either all experiments collectively (#1) or individual experiments selectively (#2).

#1: Running the all-in-one script. We provide an all-in-one AE script for running all experiments end-to-endly:

AE/run_all.sh

This script will run for approximately 8 hours and store all results in the AE directory.

#2: Running specific experiments. If you wish to replicate only specific experiments, we provide nine separate scripts corresponding to different experimental settings. These scripts, located in the AE/exps/expX.sh files (where X = 1, 2, ..., 8), can be used to reproduce all the figures presented in our paper. The name of all scripts (i.e., expX.sh) are aligned with those presented in our main paper.

If you want to run individual experiments, please refer to these script files and the comments in them (which describes the relationship between experiments and figures/tables).

Estimated running hours of experiments are shown in the Table below.

Experiment	Description	Duration (hours)
EXP1	Overall throughput	1.5
EXP2	End-to-end latency	1.5
EXP3	PIM utilization	1.5
EXP4	Coroutine-based bus ownership switching	1.0
EXP5	Effect of selective replication	0.5
EXP6	Contributions of individual techniques	1.0
EXP7	Comparison with Faiss-GPU	0.5
EXP8	Cost efficiency	0.2

Plot the figures & tables

We provide two alternative ways to visualize the experimental results.

#1: (Recommended) All-in-one jupyter notebook for Visual Studio Code users.

Please install the Jupyter extension in VSCode. Then, please open AE/plot.ipynb.

Please activate the virtual environment (.venv/bin/python) by running the following commands.

source .venv/bin/activate

Then, you can run each cell from top to bottom. Each cell will plot a figure or table like below. Titles of these figures and tables are consistent with those in the paper.

#2: Traditional python plotting scripts.

We provide a traditional plotter script. Please run it in the AE directory:

cd AE
python3 plot.py

The command above will plot all figures and tables by default, and the results will be stored in the AE/figures directory. So, please ensure that you have finished running the all-in-one AE script before running the plotter.

The plotter further allows users to specify particular figures or tables to generate by providing supplementary command-line arguments. For example:

python3 plot.py exp1 exp2

Please refer to plot.py for accepted arguments.

python3 plot.py help

Detailed claims & Experimental result verification

Main Claim 1: PIM-ANNS Outperforms Existing ANNS Systems in Throughput, Latency, and Cost Efficiency

Sub-claims:

Throughput (Exp1 + Exp7): PIM-ANNS achieves 2.4–10.4× higher QPS than Faiss-CPU, PIM-ANNS-Batch, and Faiss-GPU on billion-scale datasets (SPACE-1B/SIFT-1B) at the same recall@10.
Latency (Exp2): PIM-ANNS reduces average latency by 32–43% and tail latency (P99) by 26–63% compared to Faiss-CPU, eliminating inter-batch stalls.
Cost Efficiency (Exp8): PIM-ANNS improves QPS/$ by 2.4× over CPU and 4.8× over GPU (Table 1), leveraging UPMEM’s high memory bandwidth and parallelism.

Verification:

Reproduce Figures 9 (throughput), 10 (latency), and Table 1 (cost) using provided scripts and datasets.

Main Claim 2: PIM-ANNS Overcomes Batch Scheduling Limitations via Fine-Grained Per-PU Scheduling

Sub-claims:

PU Utilization (Exp3): PIM-ANNS maintains ~80% PU utilization (vs. ~20% for PIM-ANNS-Batch) by eliminating idle gaps between batches (Figures 11–12).
Load Balancing: Per-PU dispatching ensures uniform task distribution across PUs (Figure 5b → Figure 14a).

Verification:

Profile active PU counts over time and compare utilization metrics (Figure 11-12).

Main Claim 3: PIM-ANNS’s Techniques Synergistically Improve Performance

Sub-claims:

Coroutines (Exp4): Hiding bus-switching latency with coroutines boosts throughput 3× (Figure 13).
Selective Replication (Exp5): Throughput increases with memory budget (Figure 14b).
Combined Techniques (Exp6):
- Persistent PIM kernel alone improves throughput by 30–70%.
- Adding per-PU dispatching further increases gains to 88–112% (Figure 15).

Verification:

Ablation studies (Figures 13–15) using config flags to enable/disable techniques.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
AE		AE
common		common
dpu		dpu
figures		figures
host		host
third-party		third-party
utils		utils
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
config.json		config.json
main.cpp		main.cpp
main.h		main.h
main.py		main.py
pybind_main.cpp		pybind_main.cpp
sift1B-32M-4096C.json		sift1B-32M-4096C.json
sift1M-16M-4096C.json		sift1M-16M-4096C.json
space1B-20M-4096C.json		space1B-20M-4096C.json
space1M-20M-4096C.json		space1M-20M-4096C.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[ATC'25 Artifact] PIM-ANNS: Turbocharge ANNS on Real Processing-in-Memory by Enabling Fine-Grained Per-PIM-Core Scheduling

Main Claims

Overview

Directory structure

Overview of PIM-ANNS

Environment Setup

Prerequisites

Evaluate the Artifact

Login to the server we provide

Building PIM-ANNS from source

Hello-world example

Run all experiments

Plot the figures & tables

Detailed claims & Experimental result verification

Main Claim 1: PIM-ANNS Outperforms Existing ANNS Systems in Throughput, Latency, and Cost Efficiency

Main Claim 2: PIM-ANNS Overcomes Batch Scheduling Limitations via Fine-Grained Per-PU Scheduling

Main Claim 3: PIM-ANNS’s Techniques Synergistically Improve Performance

About

Uh oh!

Releases

Packages

Languages

cds-ruc/PIM-ANNS

Folders and files

Latest commit

History

Repository files navigation

[ATC'25 Artifact] PIM-ANNS: Turbocharge ANNS on Real Processing-in-Memory by Enabling Fine-Grained Per-PIM-Core Scheduling

Main Claims

Overview

Directory structure

Overview of PIM-ANNS

Environment Setup

Prerequisites

Evaluate the Artifact

Login to the server we provide

Building PIM-ANNS from source

Hello-world example

Run all experiments

Plot the figures & tables

Detailed claims & Experimental result verification

Main Claim 1: PIM-ANNS Outperforms Existing ANNS Systems in Throughput, Latency, and Cost Efficiency

Main Claim 2: PIM-ANNS Overcomes Batch Scheduling Limitations via Fine-Grained Per-PU Scheduling

Main Claim 3: PIM-ANNS’s Techniques Synergistically Improve Performance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages