This repository implements scripts to evaluate and benchmark projects that integrate Kubernetes and Slurm. This project was originally part of a master’s thesis at the Georg August University of Göttingen. The goal of the thesis was to investigate approaches to run Kubernetes workloads in a Slurm cluster. In this repository, the following projects are subject of our evaluation:
- IBM/Bridge-Operator
- CARV-ICS-FORTH/HPK
- soerenmetje/KSI - Our original KSI implementation
- gwdg/KSI - Our improved KSI implementation
- Slurm (without any Kubernetes integration - serves as a reference point / baseline)
We are aware of further projects such as Sylabs/WLM-Operator, SchedMD/slurm-k8s-bridge, and kalenpeterson/kube-slurm. However, these projects are either strongly deprecated and did not pass our minimal functional test or aim at a different goal. Therefore, these projects are not included.
We performed two rounds of evaluations. The first round covers the CPU, memory, storage, startup time, and network performance of the different projects. The second round covers only the network performance of the improved KSI implementation. In our evaluation we used the following benchmark tools to evaluate certain metrics:
| Metric | Benchmark | Version |
|---|---|---|
| CPU performance | Sysbench CPU | 1.0.20 |
| Memory throughput | Stream | 5.10 |
| Storage throughput | fio (rnd / seq) | 3.35 |
| Network throughput | Iperf3 | 3.9 |
| Network latency | Netperf | 2.7.1 |
| Workload startup time | Our own approach | not versioned |
This first round of evaluations reflects the work presented in the following paper:
And the master’s thesis of the same name:
- Shell scripts in
src/benchmark/to:- perform benchmarks on each project
- write benchmark results into
.csvfiles
- Python scripts in
src/plot/to:- read the result files
- create plots
- Jupyter notebook analysis.ipynb to:
- read the result files
- print details such as mean, std, and difference to slurm
- CSV result files in
data/ - Log files in
logs/ - Plot images in
plots/
To perform the evaluation, a certain prerequisites have to be ensured:
- Slurm cluster up and running.
- Local machine (e.g. laptop) runs a Linux distribution. We tested this setup using Ubuntu 22.04.
- Local machine (e.g. laptop) can log in on the Slurm master node using SSH and the SSH key
.ssh/id_rsa. - Local machine has
bash,ssh, andpython3installed as well as the python packages defined in requirements.txt. - All prerequisites of all projects (KSI, HPK, Bridge-Operator) are ensured.
- For benchmarks on Slurm and Bridge-Operator, the benchmark tools has to be installed on the cluster nodes. KSI and HPK use container images and therefore do not rely on installed software.
- For benchmarks on Bridge-Operator, an additional machine is required, that runs a Kubernetes cluster. In the cluster the Bridge-Operator is required to be up and running. We describe the setup details in Setup-Bridge-Operator.md.
- For benchmarks on HPK, the HPK components has to be started and configured manually as described in Setup-HPK.md.
- To run Fio, iPerf3, and Netperf benchmarks, also manual steps are needed as described below.
The fio disk benchmarks heavily depend on the available RAM. If more RAM is available than is used as file size in the benchmark, usually Linux caches these files. As a result, the benchmark measures higher throughputs than are practically possible regarding storage device throughput. For reference: typical SATA 3 SSDs suppy 480 MB/s sequential read throughput.
A solution is use direct I/O by adding the parameter --direct=1 to fio.
Another solution to limit the available RAM during the benchmark by utilizing the tool mem-eater. Essentially, this tool allocates RAM until a desired amount of RAM is left. This limits Linux's capabilities to cache the files during the benchmark. We provide the sourcecode for mem-eater in src/benchmark/common/mem-eater.c. Start mem-eater manually before running the fio disk benchmarks. Regarding the desired RAM, it is a good rule of thumb to choose to benchmark the total filesize that is at least 2 times the available RAM - e.g. 8GiB files for 4GiB RAM.
# Compile
gcc -o mem-eater mem-eater.c
# Run ./mem-eater <desiredRamInMiB>
./mem-eater 4096The tools iPerf3 and Netperf operate in a client-server model. Therefore, in this setup it is required that the server component is started manually on a second node in the Slurm cluster.
In case of iPerf3 the server can be started by following command:
iperf3 -s -p 5003For the Netperf server you can run:
netserver -D -p 16604
-Dto do not daemonize and-pto set port.
This repository contains a script main.sh. This script is designed to be executed locally, e.g., on a laptop. It
- connects to the Slurm cluster (and Kubernetes cluster if needed)
- runs benchmarks
- copies the result file (
.csv) as well as log files from cluster back to the local machine
Following command is an example for evaluating the ksi project using the stream-memory benchmark:
# /bin/bash src/benchmark/main.sh <project> <benchmark>
/bin/bash src/benchmark/main.sh ksi stream-memoryAfter execution, the result file can be obtained in data/ and the log files in logs/.
Available parameters - the project and the benchmark - can be determined by the directory and file names.
The directory names in src/benchmark/ are the available projects:
ksihpkbridge-operatorslurm
The available benchmarks can be determined by the file names workload-*.sh inside the project directories:
sysbench-cpustream-memoryfio-diskrndfio-diskseqnetperf-latency-tcpiperf3-bandwidthstartup-time
The benchmarks fio-diskrnd, fio-diskseq, netperf-latency-tcp, and iperf3-bandwidth, require manual actions on the slurm cluster before they are executed. This is covered in the prerequisites sections.
- For testing we disabled writing caching as described here: https://stackoverflow.com/questions/20215516/disabling-disk-cache-in-linux/20215603#20215603
- Nevertheless, Linux seems to heavily utilizes file caching on read operations. To the best of our knowledge, this can not be disabled. A solution is to use more file IO size for read or write operations, than memory is available
- To benchmark the project bridge-operator, a Kubernetes cluster is needed. Theoretically, a Kind cluster is sufficient. We used a single node Kubernetes cluster deployed in a cloud VM. In order to obtain accurate results in startup-time benchmark, the time on the Slurm node and the VM have to be correct.
To add a new benchmark perform the following actions. Replace <benchmark-name> with the actual name.
- Add a Bash script file to each project dir in
src/benchmark. These files run the benchmark. Use the file nameworkload-<benchmark-name>.sh. - Extend the Bash script src/benchmark/common/parse.sh in the functions
initResultFileandparseLogFileto add parsing functionality. - Add a Python script file named
plot-<benchmark-name>.pyto the directorysrc/plot. - Test the process:
/bin/bash src/benchmark/main.sh slurm <benchmark-name>.
To add a new project that should be evaluated do the following actions. Replace <project-name> with the actual project name.
- Add a new directory named
<project-name>to the directorysrc/benchmark. - Add multiple Bash script files for all benchmarks into this directory. Use the file names
workload-<benchmark-name>.sh. For parsing, the benchmark result is expected to be printed to stdout as done in the existing workload bash script files. - Extend the Bash script src/benchmark/main.sh, by adding a new case for the project in the if-elif-else construct marked by
# Start benchmarking. - Extend all Python script files in the directory
src/plotto add<project-name>to the list ofproject_dirs. - Extend the Python script src/plot/common.py by adding a human-readable project name to the dict
_mapNames. - Test the process:
/bin/bash src/benchmark/main.sh <project-name> stream-memory.
In the current state, we completed the following benchmarks on each project:
| KSI | HPK | Bridge-Operator | Slurm | |
|---|---|---|---|---|
| Sysbench CPU | ✅ | ✅ | ✅ | ✅ |
| Stream Memory | ✅ | ✅ | ✅ | ✅ |
| Fio Disk seq | ✅ | ✅ | ✅ | ✅ |
| Fio Disk rnd | ✅ | ✅ | ✅ | ✅ |
| 💀 time-based => can not read / write desired file size | 💀 time-based => can not read / write desired file size | |||
| 💀 time-based => can not read / write desired file size | 💀 time-based => can not read / write desired file size | |||
| 💀 bug: no seq read available | ||||
| iPerf3 Network Throughput | ✅ | ✅ | ✅ | ✅ |
| Netperf Network Latency (TCP) | ✅ | ✅ | ✅ | ✅ |
| Workload start up time | ✅ | ✅ | ✅ | ✅ |
✅ = successfully completed 💀 = error occurred / completion not possible
This second round of evaluations reflects the work presented in the following paper:
- TBD
The focus of the second evaluation was to compare the network performance of various network drivers for rootless containers when used with KSI including:
- bypass4netns with Nerdctl
- slirp4netns with Nerdctl
- pasta with Podman
- Baseline without containerization
- Shell scripts in
src2/benchmark/to:- run all benchmarks
- write benchmark results into
.csvfiles
- Python scripts in
src2/plot/to:- read the result files
- create plots
- CSV result files in
data2/ - Log files in
logs2/ - Plot images in
plots2/
To perform the evaluation, a certain prerequisites have to be ensured:
- Compute node with improved KSI installed
- KSI should be available in the parent folder, e.g.,
../ksi - Rootless Nerdctl and Podman 5.x installed
- Slirp4netns installed
- Bypass4netns set up for Nerdctl
- Recent version of Kind installed
- Slurm cluster is not required
- Separate compute node with fast network connection to the first compute node with netperf and iPerf3 installed
The tools iPerf3 and Netperf operate in a client-server model. Therefore, in this setup it is required that the server component is started manually on a second node in the Slurm cluster.
In case of iPerf3 the server can be started by following command:
iperf3 -s -p 5003For the Netperf server you can run:
netserver -D -p 16604
-Dto do not daemonize and-pto set port.
The IP address of the second compute node must be set in the main.sh script under TEST_SERVER.
This repository contains a script main.sh. This script is designed to be executed on the first compute node, which has KSI installed.
/bin/bash src2/benchmark/main.shAfter execution, the result file can be obtained in data2/ and the log files in logs2/.
The script automatically executes iperf3 and netperf benchmarks.
In the current state, we completed the following benchmarks on each project:
| Bypass4netns | Slirp4netns | Pasta | No containerization | |
|---|---|---|---|---|
| iPerf3 Network Throughput | ✅ | ✅ | ✅ | ✅ |
| Netperf Network Latency (TCP) | ✅ | ✅ | ✅ | ✅ |
✅ = successfully completed