pyxis: add --container-cache persistent rootfs reuse + GC/LRU + tests #174
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds
--container-cacheto Pyxis to enable persistent reuse of the unpacked Enroot rootfs across jobs on the same node, reducing warm-start latency.It also introduces opportunistic GC/LRU eviction for cache entries to prevent the cache filesystem from filling up, and adds a Bats test suite for cache-mode behavior.
User-facing behavior
srun --container-cache(alsoPYXIS_CONTAINER_CACHE=1)--container-image--container-writableand--container-saveENROOT_ROOTFS_WRITABLE=n)pyxis_cache_u<uid>_<hash>Configuration
Cache mode requires an explicit cache root configured via plugstack:
container_cache_data_path=/raid/containers/data(required for cache mode)container_cache_gc_high=85container_cache_gc_low=80Example (cluster-specific):
required /path/to/spank_pyxis.so container_cache_data_path=/raid/containers/data container_cache_gc_high=85 container_cache_gc_low=80## Cache directory layout
<base>=container_cache_data_path<base>/<uid>(mode0700, owned by<uid>:<gid>)<base>/<uid>/pyxis_cache_u<uid>_<hash>ENROOT_DATA_PATH=<base>/<uid>for cache mode.GC / LRU eviction
<base>/*/pyxis_cache_*(global across users)mtimeas LRU (Pyxis touches the dir on use).pyxis_cache_lock<base>/pyxis-container-cache-gc.lockTests
bats tests/container_cache.batsfor:<base>/<uid>PYXIS_CONTAINER_CACHE=1)To run tests (cluster-specific; adjust paths as needed):
export SLURM_ROOT=/cm/local/apps/slurm/24.11
export PATH="$SLURM_ROOT/bin:$SLURM_ROOT/sbin:$PATH"
export LD_LIBRARY_PATH="$SLURM_ROOT/lib64:${LD_LIBRARY_PATH:-}"
export SLURM_CONF=/etc/slurm/slurm.conf
bats tests/container_cache.bats(Optional if your squashfs image differs:
PYXIS_TEST_SQSH_IMAGE=/path/to/image.sqsh bats tests/container_cache.bats)