More on Queue Commands and Job Management

It is strongly advised that all heavy and time-consuming program executions be executed on Metis compute nodes. Any such program executed on the login node will be terminated.

Metis uses a resource manager (PBS) to control batch jobs and available resources for distributed high-performance computing. The following are the most basic and important tools that users can use to manage their jobs:

 

qsub

qsub - submit a pbs job

Syntax:    qsub [options] pbs_script_file

This creates and submits a pbs job in batch mode, wherein the task specified in the script file is executed in the background, using the resources specified in the script file or the options at time of job submission. The resource manager assigns an identification number (JobID) for the job, which the user can use to monitor and manage the job itself. For example:

user@metis qsub runmyjob.pbs

10125.cm

user@metis

The JobID assigned to the 'runmyjob.pbs' job is the leading number in the second line, which is 10125. Because the job is running in the background, all program output to standard out will be written to a log file. For the above example, the log file will be named ' runmyjob.pbs.o10125'. Messages printed to standard error can also appear in the same log file, or in a separate file named ' runmyjob.pbs.e10125'.

qstat

qstat - show status of pbs batch jobs

Syntax: qstat [options] [JobID]

Use this command to request the status of jobs, queues or batch servers. If JobID is not specified, a list of all the jobs currently running or waiting on the batch server will be printed. Job status can be checked under the 'S' column using the following legend:

R job is running

Q job is in queue, will start when resources are available

C job is finished

metis% qstat

Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
6954.cm 6222_multipass z1690179 22:52:55 R q64cpus
6957[].cm magda.pbs z1756868 0 R long
6960.cm 6282_5nm z1690179 0 Q long
6974.cm TiTiN_50 z1761164 0 Q long
6977.cm 7Ni3WCNTmttm z1791832 191:12:3 R q16cpus
6986.cm 18Ni3WCNTmttm z1791832 0 Q long
6987.cm c10_im rarichardson 65:51:47 R long
6988.cm PHAD_PIC_GPU hermans 0 Q long

metis%

qdel

qdel - delete pbs batch job

Syntax:    qdel [options] JobID

Use this command to delete a job on the server. If the job is still running, it will be terminated; otherwise, it will be removed from the queue.

Consult the main pages for qsub, qstat, and qdel for more details.

qhist

qhist - show the PBS records of the completed jobs

Syntax:    qhist [options]

Use this command to obtain historical job records on the server. Specifically, to obtain the detailed information about a completed job JobID one can run "qhist -D year --raw | grep JobID":

metis% qhist -D year --raw | grep 5466.cm

12/11/2023 13:27:05;E;5466.cm;user=xxxx group=yyy project=_pbs_project_default jobname=gm queue=q16cpus ctime=1702322495 qtime=1702322495 etime=1702322495 start=1702322495 exec_host=cn09/0*8+cn09/1*8 exec_vnode=(cn09:ncpus=8:mem=2097152kb)+(cn09:ncpus=8:mem=2097152kb) Resource_List.mem=4gb Resource_List.mpiprocs=16 Resource_List.ncpus=16 Resource_List.nodect=2 Resource_List.place=free Resource_List.pmem=768mb Resource_List.select=2:ncpus=8:mpiprocs=8 Resource_List.walltime=05:30:00 session=1024808 end=1702322825 Exit_status=0 resources_used.cpupercent=99 resources_used.cput=00:05:28 resources_used.mem=322884kb resources_used.ncpus=16 resources_used.vmem=582672kb resources_used.walltime=00:05:28 run_count=1

Job monitoring tools at Metis

We provide extensions of the standard PBS monitoring tools, both for running and completed jobs.
The output allows the construction of efficient resource request directives
#PBS -l select=Nchunks:ncpus=Ncpus:mpiprocs=Np:mem=XXXgb

Each job depends on three critical numbers: the amount of reserved memory Ur = XXXgb per the requested chunk of resources, the total number of running processes Np per requested chunk,  and the amount of memory Up<Ur/Np (gb) each process can consume during a job run.  The first two numbers the user should provide in the select directive are based on the estimation of the Up amount.  For example, if you know that one process consumes Up gigabytes of memory and want to run  Np such processes, you should reserve at least  Ur=Np x Up gigabytes per chunk.  But if you can not estimate the Upamount, you can measure it using the tools described below.

jobstat

jobstat

Syntax:    jobstat [options]

Use this command to obtain information about running jobs. For each running job the jobstat command shows the output as below, including the average Load of the requested CPUs, the ratio and current values of used and requested memory (uM/rM(%), uM(gb), rM(gb)), and the percentage of the wallitime used uWTM:

metis% jobstat

Job User Account Queue [S] Load(%) uM/rM(%) uWTM(%) nCHNKs nCPUs uM(gb) rM(gb) Start Duration Nodes
---- -------- -------- ------- ---------------------------------------------------------------
5461 z1591116 climlab long [R] 100.31 64.26 23.91 8 1024 329 512 -05:44:18 24:00:00 8 nodes

jmtanl

jmtaln

Syntax:    jmtanl search_pattern [options]

Searches in qhist output for jobs matching with the search_pattern in their records. For each matched job the jmtanl command shows the output as below, including the ratio and measured values of used and requested memory (uM/rM(%), uM(gb), rM(gb)), and the job exit status:

metis% jmtanl 5190.cm

Job User Group CPUs Nodes Chunks uRAM(gb) rRAM(gb) u/r(%) WallTime-EndTime Exit
--- ---- ----- ---- ----- ------ -------- ------- ------ ------------------------- ----
5190.cm zxxxxxxx mdmech15 80 5 5 3.92 32.00 12.25 29:47:15-12/01/23-05:37:19 0

jmanl

jmaln zxxxxxxx

Syntax:    jmanl search_pattern [options]

Searches for completed jobs matching with the search_pattern in their records. For each matched job the jmtanl command shows the output as below, including the ratio and measured values of used and requested memory per chunk (uM/rM(%), uM(gb), rM(gb)):

metis% jmtanl 5190.cm

Completed jobs last week
Job 5460.cm-z1591116-climlab (32 CPUs, 1 node(s), 1 chunk(s)): memory per chunk::Used=66.20 gb, Reserved=128.00 gb; U/R=51.71%

Jobs optimization and control

Multichunk jobs need careful construction to help the batch job system run efficiently. When submitting such jobs:

  1. ensure the requested memory and the numbers of CPUs and GPUs are optimal for the task. We recommend running several short  ~1-hour jobs, increasing the number of requested processors until the execution time reaches a minimum. For example, one can have a task that will run 1 hour on 10 CPUs, 30 min on 20 CPUs, 20 min on 40, and 15 min on 80 CPUs. Such results tell that ~40-CPU jobs are optimal for this workflow. Of course, in the case of perfect scaling, the more CPUs, the better if they are available;

  2. check the load of nodes running your jobs using the "jobstat" command. For a well-balanced MPI job the Load measurement will be close to 100%, indicating that all requested CPUs are busy with tasks. The lower this parameter, the fewer CPUs are in use and the less efficient the usage of requested CPUs;

  3.  accurately estimate the time requested for long jobs (#PBS -l walltime=hh:mm:ss). It can dramatically affect your job starting time and will allow a better schedule of maintenance tasks. This can be done as follows:
     - estimate the fraction of events, records, time steps, iterations, etc., your application can process during a ~15-minute test job.
    - extrapolate to find the time needed to process an entire dataset. For example, if the measured time to process 100 records is 100 sec, then we can expect that 1000 records will be processed in 1000 sec. Multiply the result by a factor of two to cover the positive uncertainty.
    - If the result exceeds 24-48 hours, think about how to split the job -  running several short jobs can decrease the waiting time in the PBS queue.

     

Prospective user?

Request an account.

Back to top