Contents view

Global view

Launching a computation on the platform means submitting a “job” in the queue among those available. This involves the following procedure:

Cluster connexion
Data transfert
BATCH script creation
Job submission

Nodes

Waiting queues (Partitions)

Commands for managing your “jobs”

Monocore job sample : monocore.slurm

Requesting a computation core on a node and 5 MB for 10 minutes. Sending an email at each stage of the job’s life.

Create a sbatch file here named monocore.slurm

#!/bin/bash                                                                                                      
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=00:10:00
#SBATCH --mail-type=ALL
#SBATCH --job-name=my_serial_job
#SBATCH --output=job_seq-%j.out
#SBATCH --mail-user=votre.mail@domain.précis
#SBATCH --mem=5M
time sleep 30
hostname

Job submission

The command “sbatch monocore.slurm” will put the job in the default queue because no queue is specified in the file. The job will run as soon as the resources are available.

Sbatch options

#SBATCH --partition=partition name
#SBATCH --job-name=job name
#SBATCH --output=file in which the standard output will be saved
#SBATCH --error=name of the file to store the errors
#SBATCH --input=file name of the standard input
#SBATCH --open-mode="append" to write in existed file, "truncate" to reset files
#SBATCH --mail-user=your@mail
#SBATCH --mail-type=<BEGIN,END,FAIL,TIME_LIMIT,TIME_LIMIT_50,...> case of sending an e-mail
#SBATCH --sockets-per-node=1 or 2
#SBATCH --threads-per-core thread number per core, no usable with MatriCS plateform, nodes aren’t multithreaded (ask us if it’s needed.)
#SBATCH --cores-per-socket= Core number per socket
#SBATCH --cpus-per-task=CPU number for each task
#SBATCH --ntasks=task number
#SBATCH –mem-per-cpu=RAM per core
#SBATCH --ntasks-per-node=task number per node.

Variable d’environnement SBATCH

SLURM_JOB_ID : job id
SLURM_JOB_NAME : job name
SLURM_JOB_NODELIST : Used nodes list
SLURM_SUBMIT_HOST : server from which the job has been launched
SLURM_SUBMIT_DIR : Répertoire dans lequel le job a été lancé
SLURM_JOB_NUM_NODES : Nombre de nœuds demandés
SLURM_NTASKS_PER_NODE : Nombre de cœurs demandés par nœuds
SLURM_JOB_CPUS_PER_NODE : Nombre de thread par nœud

Exemple d’un job MPI : jobMPI.slurm

Demande de 2 nœuds et 16 cœurs, 8 Mo sur chacun pour 10 minutes.

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --time=00:10:00
#SBATCH --job-name=my_mpi_job
#SBATCH --output=mpi_job-%j.out
#SBATCH --mem=8M
#SBATCH --mail-type=ALL
#SBATCH --mail-user=laurent.renault@u-picardie.fr
ml gnu12 openmpi4
mpiexec time sleep 30

Example d’un job openMP : job_openMP.slurm

#!/bin/bash                                                                                                 

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --time=04:00:00
#SBATCH --job-name=my_openmp_job
#SBATCH --mem=96M

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./my_program

GPU usage

To use GPUs, specify the following parameter –gres=gpu:X with X the number of GPUs.
Here sbatch script “mon_script.sh” to ask 2 GPUs and 28 cores (bigpu).

#!/bin/sh
#SBATCH --job-name=tensor 
#SBATCH --partition=bigpu 
#SBATCH --gres=gpu:2 
#SBATCH --time=0:10:00 
#SBATCH --mail-type=ALL 
#SBATCH --output=job-%j.out 
#SBATCH --mem=60G 
#SBATCH --nodes=1 
#SBATCH --ntasks-per-node=28

hostname
python hello.py

To submit the job, use the following command :

sbatch mon_script.sh

Interactif sample :
srun --ntasks=1 --mem=4G --gres=gpu:1 --time=1:00:00 --partition=bigpu --pty /bin/bash
nvidia-smi command shows you GPUs usage.

MatriCS Plateform

Shared platform for the research laboratories of the University of Picardie Jules Verne

Compute !