Compute !

Global view

Launching a computation on the platform means submitting a “job” in the queue among those available. This involves the following procedure:

  1. Cluster connexion
  2. Data transfert
  3. BATCH script creation
  4. Job submission

Nodes

Waiting queues (Partitions)

Commands for managing your “jobs”

Monocore job sample : monocore.slurm

Requesting a computation core on a node and 5 MB for 10 minutes. Sending an email at each stage of the job’s life.

Create a sbatch file here named monocore.slurm

#!/bin/bash                                                                                                      
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=00:10:00
#SBATCH --mail-type=ALL
#SBATCH --job-name=my_serial_job
#SBATCH --output=job_seq-%j.out
#SBATCH --mail-user=votre.mail@domain.précis
#SBATCH --mem=5M
time sleep 30
hostname

Job submission

The command “sbatch monocore.slurm” will put the job in the default queue because no queue is specified in the file. The job will run as soon as the resources are available.

Sbatch options

  • #SBATCH --partition=partition name
  • #SBATCH --job-name=job name
  • #SBATCH --output=file in which the standard output will be saved
  • #SBATCH --error=name of the file to store the errors
  • #SBATCH --input=file name of the standard input
  • #SBATCH --open-mode="append" to write in existed file, "truncate" to reset files
  • #SBATCH --mail-user=your@mail
  • #SBATCH --mail-type=<BEGIN,END,FAIL,TIME_LIMIT,TIME_LIMIT_50,...> case of sending an e-mail
  • #SBATCH --sockets-per-node=1 or 2
  • #SBATCH --threads-per-core thread number per core, no usable with MatriCS plateform, nodes aren’t multithreaded (ask us if it’s needed.)
  • #SBATCH --cores-per-socket= Core number per socket
  • #SBATCH --cpus-per-task=CPU number for each task
  • #SBATCH --ntasks=task number
  • #SBATCH –mem-per-cpu=RAM per core
  • #SBATCH --ntasks-per-node=task number per node.

Variable d’environnement SBATCH

  • SLURM_JOB_ID : job id
  • SLURM_JOB_NAME : job name
  • SLURM_JOB_NODELIST : Used nodes list
  • SLURM_SUBMIT_HOST : server from which the job has been launched
  • SLURM_SUBMIT_DIR : Répertoire dans lequel le job a été lancé
  • SLURM_JOB_NUM_NODES : Nombre de nœuds demandés
  • SLURM_NTASKS_PER_NODE : Nombre de cœurs demandés par nœuds
  • SLURM_JOB_CPUS_PER_NODE : Nombre de thread par nœud

Exemple d’un job MPI : jobMPI.slurm

Demande de 2 nœuds et 16 cœurs, 8 Mo sur chacun pour 10 minutes.

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --time=00:10:00
#SBATCH --job-name=my_mpi_job
#SBATCH --output=mpi_job-%j.out
#SBATCH --mem=8M
#SBATCH --mail-type=ALL
#SBATCH --mail-user=laurent.renault@u-picardie.fr
module load openmpi/gcc/64/4.1.2
mpiexec time sleep 30

Example d’un job openMP : job_openMP.slurm

#!/bin/bash                                                                                                 

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --time=04:00:00
#SBATCH --job-name=my_openmp_job
#SBATCH --mem=96M

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./my_program

GPU usage

To use GPUs, specify the following parameter –gres=gpu:X with X the number of GPUs.
Here sbatch script “mon_script.sh” to ask 2 GPUs and 28 cores (bigpu).

#!/bin/sh
#SBATCH --job-name=tensor 
#SBATCH --partition=bigpu 
#SBATCH --gres=gpu:2 
#SBATCH --time=0:10:00 
#SBATCH --mail-type=ALL 
#SBATCH --output=job-%j.out 
#SBATCH --mem=60G 
#SBATCH --nodes=1 
#SBATCH --ntasks-per-node=28

hostname
python hello.py
  • To submit the job, use the following command :
sbatch mon_script.sh
  • Interactif sample :
    srun --ntasks=1 --mem=4G --gres=gpu:1 --time=1:00:00 --partition=bigpu --pty /bin/bash
  • nvidia-smi command show you GPUs usage.