Used memory by closed job

      Comments Off on Used memory by closed job
  • Checking a posteriori the memory
$> sacct -o jobid,reqnodes,reqcpus,reqmem,maxrss,averss,elapsed -j JOBID
#reqmem : RAM demandée via sbatch
#maxrss : RAM maximale utilisée
#averss : RAM moyenne utilisée
  • In the following example, the job has used 105GB of RAM.
$> sacct -o jobid,reqnodes,reqcpus,reqmem,maxrss,averss,elapsed -j 94079
       JobID ReqNodes  ReqCPUS     ReqMem     MaxRSS     AveRSS    Elapsed
------------ -------- -------- ---------- ---------- ---------- ----------
94079               1        1      125Gn                         00:10:20
94079.batch         1        1      125Gn 105823148K 105823148K   00:10:27
  • However, the measure is not necessarily reliable
  • Here is a python program with which we have launched a job. This program uses 3Go of RAM.
  • This programs runs 1 min (with sleep(60))
import psutil
import time
import numpy as np
arr=np.ones((1024,1024,1024,3), dtype=np.uint8)
print(psutil.Process().memory_info().rss / (1024*1024))
time.sleep(60)
  • The analysis given by the command sacct is here correct.
$> sacct -o jobid,reqnodes,reqcpus,reqmem,maxrss,averss,elapsed -j 703201
       JobID ReqNodes  ReqCPUS     ReqMem     MaxRSS     AveRSS    Elapsed 
------------ -------- -------- ---------- ---------- ---------- ---------- 
703201              1        1       60Gn                         00:01:06 
703201.batch        1        1       60Gn   3172208K   3172208K   00:01:06 
  • If the execution time is too short, the scheduler does not give a correct analysis of the memory really used.
  • Le programme est presqu’exactement le même. Il utilise bien 3Gos de RAM mais ne dure que 10 secondes (avec sleep(10))
  • The program is almost the same. It uses 3GB of RAM but runs only 10 seconds (with sleep(10))
import psutil
import time
import numpy as np
arr=np.ones((1024,1024,1024,3), dtype=np.uint8)
print(psutil.Process().memory_info().rss / (1024*1024))
time.sleep(10)
  • However, the analysis given by the command sacct is here wrong.
$> sacct -o jobid,reqnodes,reqcpus,reqmem,maxrss,averss,elapsed -j 703202
       JobID ReqNodes  ReqCPUS     ReqMem     MaxRSS     AveRSS    Elapsed 
------------ -------- -------- ---------- ---------- ---------- ---------- 
703202              1        1       60Gn                         00:00:11 
703202.batch        1        1       60Gn      1484K      1484K   00:00:11 

Python

  • To monitor the memory used by a process in Python, we can use the following code:
import psutil
# ...
psutil.Process().memory_info().rss / (1024*1024)
  • The division by 1024*1024 gives a value into GB.
  • Here is a piece of Python code to monitor the memory used by the process proc at a given time.
while proc.poll() is None:
    rss = psutil.Process(proc.pid).memory_info().rss
    proc.wait(timeout)