BAZIS HPC
The BAZIS is a compute cluster for research at the Vrije Universiteit Amsterdam. It provides a service between the general facilties at the SURF HPC center and the desktop. It is a heterogenious system composed of clusters and servers from research departments and a general partition.
In this topic you will find information to get you started and best practices. Clustercomputing can be very powerfull and usefull skill to add to your toolbox and get more science done.
Also take a look at the SURF wiki about Snellius, it contains a lot of information which applies to Bazis as well.
Information and how to get an account can be found on VU Service Portal under IT > Research > HPC Cluster Computing
BAZIS is maintained by IT for Research.
Connect
ssh
Use your favourite SSH client to login at <your login>@bazis.labs.vu.nl
. On windows we recommend MobaXterm or MS Terminal; Mac users can use iterm. Direct access is only possible from the Campus or from SURF. From other network locations first connect to the stepstone <vu net id>@ssh.data.vu.nl
, or use eduVPN Institutional Access (below).
eduVPN
Install the client. Start eduVPN and choose Institute Access to connect. You will need MFA to authenticate. Students can activate MFA at the servicedesk ( kb-item 11809 )
Data transfer
MobaXterm has an integrated FTP file browser. Once you have logged in to the HPC system, you will see the file browser to the left of the terminal window, where it shows the contents of your home folder. You can browse through these folders, and drag-and-drop files and folders between this FTP file browser and the Windows File Explorer. Alternatively, you can use the download/upload buttons at the top of the FTP file browser window. A green refresh button is also located there to refresh the contents of the current folder. You can also open files in the FTP file browser to edit them directly. Upon saving, you’ll be asked if you want change these files on the HPC system.
Other SFTP browsers There are a large number of free FTP browser out there. Some examples are
- Filezilla (Windows, MacOS, Linux)
- Cyberduck (Windows, MacOS)
- WinSCP (Windows)
Advanced: Using SSH keys
SSH keys are an alternative method for authentication to obtain access to remote computing systems. They can also be used for authentication when transferring files or for accessing version control systems like github.
The cluster uses ssh keys to manage batch jobs.
On your workstation create ssh key pair ssh-keygen -t ed25519 -a 100
-a
(default is 16): number of rounds of passphrase derivation; increase to slow down brute force attacks.-t
(default is rsa): specify the cryptographic algorithm. ed25519 is faster and shorter than RSA for comparable strength.-f
(default is /home/user/.ssh/id_algorithm): filename to store your keys. If you already have SSH keys, make sure you specify a different name: ssh-keygen will overwrite the default key if you don’t specify!
If ed25519 is not available, use the older (but strong and trusted) RSA cryptography: ssh-keygen -a 100 -t rsa -b 4096
When prompted, enter a strong password that you will remember.
Note: on windows you can use MobaKeyGen from MobaXterm, but on Windows 11 Powershell or Command Prompt works as well.
In your ~/.ssh directory you will find a public and private key. Make sure to keep the private key safe as anyone with the private key has access.
Now, when you add your public key to the ~/.ssh/authorized_keys file in a remote system, your key will be used to login.
You can either use copy-paste or the ssh-copy-id command:
$ ssh-copy-id user@remote-host
The authenticity of host 'remote-host (192.168.111.135)' can't be established.
ECDSA key fingerprint is SHA256:hXGpY0ALjXvDUDF1cDs2N8WRO9SuJZ/lfq+9q99BPV0.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 2 key(s) remain to be installed -- if you are prompted now it is to install the new keys
user@remote-host's password:
Number of key(s) added: 2
Now try logging into the machine, with: “ssh ‘user@remote-host’” and check to make sure that only the key(s) you wanted were added.
Advanced: Forwarding X
Fix Warning: Remote Host Identification Has Changed
If you are sure that it is harmless and the remote host key has been changed in a legitimate way, you can skip the host key checking by sending the key to a null known_hosts file:
ssh -o "UserKnownHostsFile=/dev/null" -o "StrictHostKeyChecking=no" username@bazis.labs.vu.nl
To make the change permanent remove the offending host key from your ~/.ssh/known_hosts file with:
ssh-keygen -R "hostname"
References:
Using Slurm
Usefull commands
See jobs in the queue for a given user
squeue -u username
Show available node features
sinfo -o "%20N %10c %10m %25f %10G "
Submit a job
sbatch script
Show the status of a currently running job
sstat -j jobID
Show the final status of a finished job
sacct -j jobID
Cancel a job
scancel jobid
Best practices
Organise your input, output and temporary data. Make use of fast scratch directory ($TMPDIR).
Don’t run large computation on the login nodes! It negatively impacts all cluster users. Grab a compute node with srun –pty bash option.
Constraints
The SLURM constraint option allows for further control over which nodes your job can be scheduled on in a particular parition/queue. You may require a specific processor family or network interconnect. The features that can be used with the sbatch constraint option are defined by the system administrator and thus vary among HPC sites.
Constraints available on BAZIS are cpu architecture and gpu. Example (singole constraint):
#SBATCH --constraint=zen2
Example combining constraints:
#SBATCH --constraint="zen2|haswell"
Computer architecture
The parts of a modern computer we need to understand to apply to running jobs are listed here. (Note: This is way oversimplified and intended to give a basic overview for the purposes of understanding how to request resources from Slurm, there are a lot of resources out there to dig deeper into computer architecture.)
Board
A physical motherboard which contains one or more of each of Socket, Memory bus and PCI bus.
Socket
A physical socket on a motherboard which accepts a physical CPU part.
CPU
A physical part that is plugged into a socket.
Core
A physical CPU core, one of many possible cores, that are part of a CPU.
HyperThread
A virtual CPU thread, associated with a specific Core. This can be enabled or disabled on a system. On BAZIS hyperthreading is typically enabled. Compute intensive workloads will benefit to disable hyperthreading.
Memory Bus
A communication bus between system memory and a Socket/CPU.
PCI Bus
A communication bus between a Socket/CPU and I/O controllers (disks, networking, graphics,…) in the server.
Slurm complicates this, however, by using the terms core and cpu interchangeably depending on the context and Slurm command. –cpus-per-taks= for example is actually specifying the number of cores per task.
Slurm example jobs
simple job
#!/bin/bash -l
#SBATCH -J MyTestJob
#SBATCH -N 1
#SBATCH -p defq
echo "== Starting run at $(date)"
echo "== Job ID: ${SLURM_JOBID}"
echo "== Node list: ${SLURM_NODELIST}"
echo "== Submit dir. : ${SLURM_SUBMIT_DIR}"
echo "== Scratch dir. : ${TMPDIR}"
cd $TMPDIR
# Your more useful application can be started below!
hostname
Workspace
scratch tmpdir
Each slurm job will have a fast scratch dir allocated on the nodes which is deleted after finishing the job. use the $TMPDIR virable to use this space for example to store intermediate results or work on many files.
Python virtual environments
Python has many powerfull packages. In scientific computing many packages may be used in a single project. To manage many python packages often a package manager as conda is used.
On a HPC system we do not prefer conda as it does not use optimised binaries and the cache can take up a lot of space, but we understand it is usefull in some cases and try to help.
Working with virtual environments further makes the python environment better to manage
- A virtual environment is a named, isolated, working copy of Python that that maintains its own files,
- directories, and paths so that you can work with specific versions of libraries or Python itself without affecting other Python projects.
- Virtual environmets make it easy to cleanly separate different projects and avoid problems with different dependencies and version requirements across components.
In short:
- use virtualenv (preferred) or conda
- create an isolated environment
- Install packages
- Activate a virtual environment
- Deactivate a virtual environment
- Delete a virtual environment
Adding a requirements file
Python requirements files are a great way to keep track of the Python modules. It is a simple text file that saves a list of the modules and packages required by your project. By creating a Python requirements.txt file, you save yourself the hassle of having to track down and install all of the required modules manually.
A reuirements file is a simple text file, which looks like this.
tensorflow==2.3.1
uvicorn==0.12.2
fastapi==0.63.0
Installing modules from a requirements file is easy as.
pip install -r requirements.txt
A requirements file can also be generated with:
pip freeze > requirements.txt
See the article referenced below for more information.
Portable scripts
The first line in a script usually starts the interpreter and is called the Shebang. It is recommended to use /usr/bin/env, which can interpret your $PATH. This makes scripts more portable than hard coded paths..
#!/usr/local/bin/python
Will only run your script if python is installed in /usr/local/bin.
#!/usr/bin/env python
Will interpret your $PATH, and find python in any directory in your $PATH.
So your script is more portable, and will work without modification on systems where python is installed as /usr/bin/python, or /usr/local/bin/python, or even custom directories (that have been added to $PATH), like /opt/local/bin/python.
Further Reading
R environment
R has many powerfull scientific packages and a strong community. Installing and maintaining packages for R can be hard. On BAZIS the Bioconductor suite is installed and can be loaded with the appropiate module environment.
When first running R on a Cluster some changes in the workflow are required making the transition from working interactively from a terminal to scripts in batchmode.
Errors
Pretty much all the time we get errors. Errors can be simple e.g.syntax error, R/python version error or more complex e.g. a problem in our data. In either case, please pay attention to what the error says carefully, because often the solution is in that message or at least it is the starting point of the solution while debugging your code. If it is an error you have not seen before, simply google it. Often you will find a solution in websites like stackoverflow.
Tips and Caveats
Matlab
Matlab has several features to work in batch mode on a HPC cluster. Assuming you know how to create matlab scripts we start simply by executing matlab interactively on a compute node
Interactive
Request resources (1 node, 1 cpu) in a partition
srun -N 1 -p defq --pty /bin/bash
module load matlab/R2023a
cd your/data/
Here is an example of a trivial MATLAB script (hello_world.m):
fprintf('Hello world.\n')
Run with matlab using only one computational thread.
$ matlab -nodisplay -singleCompThread -r hello_world
Hello world.
>>
Matlab waits at the end of the script if there is no exit. In an compute job this would keep the job running untill the wallclocklimit so we add an exit at the end. The convenient “-batch” option combines these options.
-batch MATLAB_command - Start MATLAB and execute the MATLAB command(s) with no desktop
and certain interactive capabilities disabled. Terminates
upon successful completion of the command and returns exit
code 0. Upon failure, MATLAB terminates with a non-zero exit.
Cannot be combined with -r.
matlab -batch hello_world
Batch mode
Combining this in a slurm script we can queue matlab workloads.
#!/bin/bash -l
#SBATCH -J MyMatlab
#SBATCH -N 1
#SBATCH --cpus-per-task=1
#SBATCH -p defq
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=<YOUR EMAIL>
# Note: for parallel operations increase cpus-per-task above
# Note 2: output and error logs can be given absolute paths
echo "== Starting run at $(date)"
echo "== Job ID: ${SLURM_JOBID}"
echo "== Node list: ${SLURM_NODELIST}"
echo "== Submit dir. : ${SLURM_SUBMIT_DIR}"
echo "== Scratch dir. : ${TMPDIR}"
# cd $TMPDIR
# or change to a project folder with matlab file e.g. hello_World.m
# cd your/data
# Load matlab module
module load 2022 matlab/R2023a
# execute
matlab -batch hello_world
Parpool
Reading
Connect to VU Yoda/iRODS with icommands
iRODS/Yoda is a middleware and webinterface to store enriched data. On BAZIS the icommands are available to allow direct access.
To use the icommands a configuration file in your home folder is required. https://yoda.vu.nl/site/getting-started/icommands.html#configuration
After configuration use iinit
to connect, use a Data Access Password, similar to a WebDAV connection.
Find an overview of the icommands here: https://docs.irods.org/master/icommands/user/ Check the ipwd
, ils
, icd
, iput
, iget
en irsync
commands for navigation and data transfer.
Your project folder is located at /vu/home