Contents                    
                view            
            
Commands for managing your “jobs”: Memo
Information about a job
to report information about active or completed job.
sacct -j job-id
To submit a job
The script will typically contain one or more srun commands to launch parallel tasks.
sbatch script.slurm sbatch -x node037 my_script.sh -> submits by excluding a calculation node
To cancel a job
scancel job-id
Information about partitions and nodes
sinfo
To list free nodes
Partition which integrate nodes is mentionned
sinfo --states=idle
Node states
- mix : consumable resources partially allocated
 - idle : available to requests consumable resources
 - drain : unavailable for use per system administrator request
 - drng : currently executing a job, but will not be allocated to additional jobs. The node will be changed to state DRAINED when the last job on it completes
 - alloc : consumable resources fully allocated
 - down : unavailable for use. Slurm can automatically place nodes in this state if some failure occurs.
 
State of your jobs
squeue --me
Job states
- BF BOOT_FAIL Job terminated due to launch failure.
 - CA CANCELLED Job was explicitly cancelled.
 - CD COMPLETED Job has terminated.
 - CF CONFIGURING Job has been allocated resources, but are waiting for them to become ready for use.
 - CG COMPLETING Job is in the process of completing.
 - F FAILED Job terminated with error code.
 - NF NODE_FAIL Job terminated due to failure of one or more allocated nodes.
 - OOM OUT_OF_MEMORY Job experienced out of memory error.
 - PD PENDING Job is awaiting resource allocation.
 - PR PREEMPTED Job terminated due to preemption.
 - R RUNNING Job currently has an allocation.
 - RD RESV_DEL_HOLD Job is being held after requested reservation was deleted.
 - RF REQUEUE_FED Job is being requeued by a federation.
 - RH REQUEUE_HOLD Held job is being requeued.
 - RQ REQUEUED Completing job is being requeued.
 - RS RESIZING Job is about to change size.
 - SI SIGNALING Job is being signaled.
 - SE SPECIAL_EXIT The job was requeued in a special state.
 - SO STAGE_OUT Job is staging out files.
 - ST STOPPED Job has an allocation, but execution has been stopped with SIGSTOP signal. CPUS have been retained by this job.
 - S SUSPENDED Job has an allocation, but execution has been suspended and CPUs have been released for other jobs.
 - TO TIMEOUT Job terminated upon reaching its time limit.
 
Job in real time
To submit a job in real time. srun has a wide variety of options.
srun command with parameters