Implementations of neuroimaging computation tasks for benchmarking performance and code demonstration across programming languages.
Currently implementing
niimean-- mean of nii.gz datasetvoxcor-- correlation within each ROI between two 3D images.
Use
maketo run shootout.make checkto confirm implementations are correct.make dependsto runsetup.bash, more in #setup
Also see
- https://benchmarksgame-team.pages.debian.net/benchmarksgame/ -- run time benchmarks across many languages and tasks
- https://julialang.org/benchmarks/ -- run times for science specific tasks and environments
- https://rosettacode.org/ -- programming chrestomathy: same task in many programming languages
- Rust, Julia, Fortran, and Perl have Int16 overruns that were not obvious. The issue is observed reading in an int16 nii.gz and summing over the large image (to calculate mean). Scaling and un-scaling is a fast workaround for Julia that does not work in perl. Using Int16 is faster than double but is inaccurate (unless rewritten to calculate a running mean).
- Runtime startups are especially slow for R and julia (and matlab). See
within-env/. - Java/JVM is on par with golang and slower than python/numpy. How to implement SIMD optimization is not immediately obvious. Library/packaging is a pain without an IDE. See jvm.md for notes.
- I couldn't find a fortran nifti library. For
niimean.f90, I'm using an already-uncompressed nifti with fixed parameters (e.g. assumes datatype=real) without shape (1D array a la javascript). When keeping the arrayrealtype, mean is calculated in ~70ms but the results are off (Int16 overrun). Recasting the type toreal(16)is much slower (~400ms) but accurate.stdlib_stats'smeanwas slightly slower (80ms) and increased the binary size from 20K to 2.8M, but did not handle the overrun any better. Using a loop to accumulate the sum was slower than type recasting (500 vs 400ms) but both provide an accurate mean calculation. perllibrary dependencies (namelyPDL) is easy on debian, but difficult on Archlinux (AUR packages fail to compile.)- Missing implementation
- I couldn't find nifi libraries for elixir/erlang (BEAM), php, nor ruby. Foreign Function Interface scares me and I haven't been able to figure out using FFI.
- common lisp is also missing a ready library, but implementation with
lisp-binarylooks feasible. And usingaprilfor APL style math is an interesting prospect. patch/adds gzip support toPDL::IO::Nifti.
See hyperfine out/*stats.csv.
Collected by Makefile across files in scripts/.
simple mean of 3D image (wf-mp2rage-7t_2017087.nii.gz)
command mean (seconds)
fslstats 0.35308495769
niimean.rs 0.4606315605666667
3dBrickStat 0.51865660084
MeasureMinMaxMean 0.5215459278600002
niimean.js 0.9318040066400002
niimean.pl 1.0452616458333333
niimean.jl 1.6311526155799996
niimean.py 1.7192550077300002
mris_calc 2.0955282514799993
niimean.m 2.103778385753333
niimean.R 2.6206949904699997
niimean.go 2.829263984946666
vs manual loop voxcor
command mean
voxcor.rs 0.10011695514074072
voxcor.go 0.2599205325600001
voxcor.py 1.38515054508
voxcor.m 1.6904066477000002
voxcor.R 2.05836244986
voxcor.jl 2.7809278878133337
For a simple mean calc, javascript is fast and go is slow!
Especially for julia and R, the interpreter/VM's startup time. They also demonstrate how much effort the community/library authors have put into optimizing (likely w/ compiled c code, SIMD optimizations) hot paths (spm12 in octave, numpy in python, pdl in perl).
- The rust implementation should be built with
--release, debug version performance is 10x worse! - Julia's interpreter (1.9.3) startup time is reasonable! It's overall time is on par with python (numpy, not native python).
- SIMD "vectorized" operations in python (via numpy) are fast!
- R's slow to start.
- Javascript is painful to write. Both it and the golang version organize the nifti matrix data as a 1D vector.
- Processor makes a difference in the shootout.
Cold startup measures are useful for utilities but not representative of interactive long-running work within an interpreter. In those cases the one time start-up cost is irrelevant. within-env/ benchmarks the same tasks but with the environment already loaded.
Run make in shell on a terminal. Optionally set NRUN (default 100 runs for each)
make NRUN=10Benchmarking uses hyperfine.
See ./setup.bash for some library setup automation/hints (also make depends).
- c
- c++
mris_calcfrom freesurferMeasureMinMaxMeanfrom ANTs
- rust
[rustup](https://rustup.rs/) update
- octave
- download spm12 and extract to
~/Downoads/spm12scripts/niimean.mhardcodes addpath - compile
cd src/ && make PLATFORM=octave install
- download spm12 and extract to
- julia
- install package in repl like
] add NIfTI
- install package in repl like
- R
install.packages('oro.nifti')
- deno
cargo install deno- first run will pull in npm package
nifti-reader-js
- julia
NiFtiandStatistics
- perl
PDLandpatch/Niftigz.pm
go (1.19 vs 1.21) and octave (7 vs 8.3) are out of date on debian stable. To use newer versions on rhea, update the path to include the compile-from-recent-source bin dir:
export PATH="/opt/ni_tools/utils/go/bin:$PATH"
# source dl and extracted to /opt/ni_tools/octave-8.3.0-src
# ./configure --prefix=/opt/ni_tools/octave-8.3/ && make install
export PATH="/opt/ni_tools/octave-8.3/bin:$PATH"
NB. but use debian backport https://wiki.debian.org/SimpleBackportCreation
- get expert goland and rust advice/implementation (should be faster?)
- implement various styles (and more complex calculations)
- loop vs vector; expect python loop to be especially slow
- parallel processing
- containerize benchmarks
- other implementations
- does GraalVM native improve java benchmarks (faster startup)?
- clojure/babashka
- julia's APL implementation
- common lisp or guile/scheme version (ffi w/ niftilib)
- compile julia
