Skip to content

coenarrow/Working_with_kaya

Repository files navigation

Working with Kaya

Table of Contents

Update: Migration to Kaya2

The information after this section may be outdated because of the migration to Kaya2. I'll update it all once everyone's been migrated over, and I have more information about the new system.

In migrating over to Kaya2, unless the documentation has significantly improved, you might need to investigate a few things yourself to adjust your scripts accordingly. I'll simply provide my own processes for figuring this out below.

Slurm Version Update

# Old Kaya
carrow@kaya[~]$ sinfo --version
slurm 23.02.0

# New Kaya
carrow@kaya01[~]$ sinfo --version
slurm 24.11.5

The only thing I've learned is that it's better to run interactive sessions with salloc, rather than srun, so if you set up aliases for shortcuts, fix this.

Setting aliases

When a new bash shell opens, it runs the script

~/.bashrc

We can add the following lines directly to that file.

Alternatively, you can make a shell script (I created a file at ~/.kaya_env.sh and then added these lines. Note that you have to make .kaya_env.sh executable by running chmod +x ~/.kaya_env.sh if you do it this way. Then in the .bashrc, I just add the line at the top.

source ~/.kaya_env.sh

The main useful alias I've been running was something like following:

# No gpu
alias quickrun='srun --job-name=Interactive_Session --partition=pophealth --export=ALL --nodes=1 --mem=16000 --ntasks=4 --time=5:00:00 --pty /bin/bash'

# For V100
alias quickrun-v1='srun --job-name=Interactive_Session --partition=pophealth --export=ALL --nodes=1 --mem=16000 --ntasks=4 --gres=gpu:v100:1 --time=5:00:00 --pty /bin/bash'

# For A100
alias quickrun-a1='srun --job-name=Interactive_Session --partition=pophealth --export=ALL --nodes=1 --mem=16000 --ntasks=4 --gres=gpu:a100:1 --time=5:00:00 --pty /bin/bash'

On the new system, we have to be a bit more specific with our partitioning.

  • For no GPU: --partition=work or --partition=medical
  • For V100: --partition=gpu
  • For H100: --partition=medical

The setup of shortcuts I have is the following.

# No gpu
alias quickrun='salloc --job-name=Interactive_Session --partition=medical --nodes=1 --mem=16000 --ntasks=4 --time=5:00:00'
alias quickrun-work='salloc --job-name=Interactive_Session --partition=work --nodes=1 --mem=16000 --ntasks=4 --time=5:00:00'

# For V100
alias quickrun-v1='salloc --job-name=Interactive_Session --partition=gpu --nodes=1 --mem=16000 --ntasks=4 --gres=gpu:v100:1 --time=5:00:00'
alias quickrun-v2='salloc --job-name=Interactive_Session --partition=gpu --nodes=1 --mem=16000 --ntasks=4 --gres=gpu:v100:2 --time=5:00:00'

# For H100
alias quickrun-h1='salloc --job-name=Interactive_Session --partition=medical --nodes=1 --mem=16000 --ntasks=4 --gres=gpu:h100:1 --time=5:00:00'
alias quickrun-h2='salloc --job-name=Interactive_Session --partition=medical --nodes=1 --mem=16000 --ntasks=4 --gres=gpu:h100:2 --time=5:00:00'
alias quickrun-h3='salloc --job-name=Interactive_Session --partition=medical --nodes=1 --mem=16000 --ntasks=4 --gres=gpu:h100:3 --time=5:00:00'
alias quickrun-h4='salloc --job-name=Interactive_Session --partition=medical --nodes=1 --mem=16000 --ntasks=4 --gres=gpu:h100:4 --time=5:00:00'
--ntasks is the number of cpus requested
--mem is the RAM requested (in MB)
--gres=gpu:<gputype>:<number of gpus>

Basic Hardware

The HPC networks seems to run basically on a login and compute node. The login node is what you use to log into, and the compute node is used when you run your slurm files (we’ll get to slurm later)

Do not try running interactive sessions on the login node – only on the compute node this is very naughty to do

Storage and Filesystem

The Kaya HPC currently has around 120TB of fast storage. The main storage locations on Kaya are:

  • /home - Linux home directories – cd $HOME
  • /group - project data – cd $MYGROUP
  • /scratch - fast temporary storage area for running jobs – cd $MYSCRATCH

Note: If these do not work, you will need to modify either the file ~/.bashrc or ~/.myenv by adding the lines similar to "export MYGROUP=/group/<yourgroup>/<yourusername>"

Their intended use is as follows:

/home Allows for relatively small storage for data like some source code, shell scripts that you want to keep. This file system is not tuned for high performance for jobs. It can be referenced by the environment variable $HOME. Home directories are quota’d and intentionally small.

/group This is the main data directory where your project data lives. You should have access to a shared data directory for your project group (/group/projectid) as well as a personal data subdirectory (/group/projectid/myuserid). In general data related to your work should live in your personal data directory. There is a system environment variable set at login so you can always access this with “cd $MYGROUP”. If you have data that you need to share with other members of your project group, then you should put data in the /group/xxxxxx directory where it will be visible to others in your group. If you need to share data with others outside your group, please contact the HPC team and we can help with this requirement.

/scratch The scratch folder is used for intermediate results for running jobs. It’s also used as a shared area for parallel jobs that run across multiple nodes. The data layout on the /scratch directory is the same as the /group directory. The scratch folder is limited to 30TB storage, shared between all users/jobs. It it important to ensure that your jobs clear intermediate data out of the scratch folder and copy results back to your project under /group. The system will periodically sweep the scratch directories and delete aged data.

Logging in

There are two ways to log into kaya

  1. Directly in terminal (linux and mac)
ssh <accountname>@kaya.hpc.uwa.edu.au
  1. In VS Code
    • Install the “SSH” extension
    • The icon for SSH is down the bottom left
    • Login with @kaya.hpc.uwa.edu.au
    • Then click “connect current window to host” or “connect to host” and log in with your password. alt text

Git

Initial Setup

Assuming you’ve never used this account before, do the following commands in terminal.

  git config --global user.name “Coen Arrow”
  git config --global user.email coen.arrow@gmail.com

Optional Setups

Set the default editor to be VSCode, otherwise, the default editor is vim, and bugger that.

git config --global core.editor code
Set the colours to be activated when you view it
git config --global --color.ui true

Remote SSH keys

Next let’s set up the remote SSH key for using git

ssh-keygen -t rsa -C your_github_email@example.com

This creates two files in ~/.ssh/ Open up the id_rsa.pub file with the following

code ~/.ssh/id_rsa.pub

then copy the contents of the file. Go to your github Account Settings Click “SSH Keys” on the left. Click “New SSH Key” on the right. Add a label (like “Kaya”) and paste the public key into the big text box. Once it’s added, type the following to test, and once you enter the password it should just work

ssh -T git@github.com

Output:

Hi username! You've successfully authenticated, but Github does
not provide shell access.

Copy a repo

Go to a private repo you have, and get the SSH link Navigate to where you want to clone the repo, then enter the following:

git clone git@github.com:coenarrow/example.git

Sources https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup https://kbroman.org/github_tutorial/pages/first_time.html

Transferring data

Transferring data to kaya is done by SCP (others too but let’s just go with this). On your local computer (not via VSCode), let’s throw some data at kaya

For individual files

scp <input_filepath> <yourname>@kaya.hpc.uwa.edu.au:<output_filepath>

Note: file extension is for the input_filepath, but only directory for output filepath, as it’ll keep the same name

For folders and copying everything recursively (the flag -r). Force the transfer via the legacy protocol using -O.

scp -r -O <input_folder_path> <yourname>@kaya.hpc.uwa.edu.au:<output_folder_path>

Transfer from kaya back to your local machine is simply the reverse

scp <yourname>@kaya.hpc.uwa.edu.au:<input_filepath> <local_output_filepath> 

Source https://kb.iu.edu/d/agye#:~:text=To%20copy%20a%20directory%20(and,source%20directory%20and%20its%20contents.&text=You'll%20be%20prompted%20for%20your%20password%20on%20the%20source,you%20enter%20the%20correct%20password

Modules

When you log in to kaya, modules are not automatically loaded unless you set that up to be so (I haven’t figured out how to do that yet, but I’ll work on it).

To use a module, just type something like:

module load Anaconda3/2021.05
conda init

as per your requirements:

For the test file we’ll do, we need to create a conda environment from a file, so we run the following, with ENVNAME as cits, and ENV.yml as cits_test.yml:

conda env create -n ENVNAME --file ENV.yml
conda env create -n cits --file cits.yml

check that the environment was created

conda env list

Activate the environment to confirm it works (should be cits)

conda activate ENVNAME
conda activate cits

Note: all of these commands are written in the .slurm file, so what we were doing here was simply confirming that they work.

Sources: https://docs.hpc.uwa.edu.au/docs/user/modules/

SLURM

Running an interactive session Note: I think there’s a better way of doing this using Tmux, but I can’t exactly remember that, so here is just the basic command to make this happen

srun --job-name=Interactive_Session --partition=pophealth --export=ALL --nodes=1 --ntasks=4 --gres=gpu:a100:1 --time=00:30:00 --pty /bin/bash

IRDS

It's super annoying to have to manually transfer data to-and-from your local machine, and is infeasible when working with very large datasets. It's also bad practice and if you're working with sensitive data you shouldn't be storing it on the HPC system at all. This is what IRDS is for.

There was a basic script for setting up a GIO mount for your personal IRDS project provided by the HPC team (though they do seem to struggle with getting internal information out).

Go here to look at the instructions on how to do this.

Scripts

Installing 3D Slicer

Run this to install 3D Slicer. By default, it'll install either in $MYGROUP/Programs (or pwd if $MYGROUP is not defined - but it should be for all kaya accounts)

wget https://raw.githubusercontent.com/coenarrow/Working_with_kaya/main/scripts/install_3DSlicer.sh && \
chmod +x ./install_3DSlicer.sh && \
./install_3DSlicer.sh && \
rm ./install_3DSlicer.sh

Note that to run 3D Slicer, you need to be in an interactive session with a GUI.

KAYA2: Interactive session with GUI

About

Helper documents for the Advanced Clinical and Translational Cardiovascular Imaging group, working with the A100s

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors