More on Queue Commands and Job Management

It is strongly advised that all heavy and time-consuming program executions be executed on Metis compute nodes. Any such program executed on the login node will be terminated.

Metis uses a resource manager (PBS) to control batch jobs and available resources for distributed high-performance computing. The following are the most basic and important tools that users can use to manage their jobs:

Queue Commands and Job Management
Job Monitoring Tools
Job Optimization and Control
Exit Status Codes

qsub

qsub - submit a pbs job

Syntax: qsub [options] pbs_script_file
This creates and submits a pbs job in batch mode, wherein the task specified in the script file is executed in the background, using the resources specified in the script file or the options at time of job submission. The resource manager assigns an identification number (JobID) for the job, which the user can use to monitor and manage the job itself. For example:

user@metis qsub runmyjob.pbs

10125.cm

user@metis

The JobID assigned to the 'runmyjob.pbs' job is the leading number in the second line, which is 10125. Because the job is running in the background, all program output to standard out will be written to a log file. For the above example, the log file will be named ' runmyjob.pbs.o10125'. Messages printed to standard error can also appear in the same log file, or in a separate file named ' runmyjob.pbs.e10125'.

qstat

qstat - show status of pbs batch jobs

Syntax: qstat [options] [JobID]
Use this command to request the status of jobs, queues or batch servers. If JobID is not specified, a list of all the jobs currently running or waiting on the batch server will be printed. Job status can be checked under the 'S' column using the following legend:

R -the job is running

Q - the job is in queue, will start when resources are available

C - the job is finished

E - the job is in an error state (this most often happens when a user deletes the job's working directory while the job is queued/running - a careless idea)

Use the "qstat -f JobID" command to obtain detailed information about a specific job.

metis% qstat

`Job ID`	`Name`	`User`	`Time Use`	`S`	`Queue`
`-------------------------`	`----------------`	`---------------`	`--------`	`-`	`-----`
`6954.cm`	`6222_multipass`	`z1690179`	`22:52:55`	`R`	`q64cpus`
`6957[].cm`	`magda.pbs`	`z1756868`	`0`	`R`	`long`
`6960.cm`	`6282_5nm`	`z1690179`	`0`	`Q`	`long`
`6974.cm`	`TiTiN_50`	`z1761164`	`0`	`Q`	`long`
`6977.cm`	`7Ni3WCNTmttm`	`z1791832`	`191:12:3`	`R`	`q16cpus`
`6986.cm`	`18Ni3WCNTmttm`	`z1791832`	`0`	`Q`	`long`
`6987.cm`	`c10_im`	`rarichardson`	`65:51:47`	`R`	`long`
`6988.cm`	`PHAD_PIC_GPU`	`hermans`	`0`	`Q`	`long`

metis%

qdel

qdel - delete pbs batch job

Syntax:
qdel JobIDor
qdel -Wforce JobID
Use this command to delete a job on the server. If the job is still running, it will be terminated; otherwise, it will be removed from the queue. Note that jobs in the error (E) state sometimes cannot be removed with the qdel command; the system automatically deletes such jobs within 24 hours.

Consult the main pages for qsub, qstat, and qdel for more details.

qhist

qhist - show the PBS records of the completed jobs

Syntax: qhist [options]
Use this command to obtain historical job records on the server. Specifically, to obtain the detailed information about a completed job JobID one can run "qhist -D year --raw | grep JobID":

metis% qhist -D year --raw | grep 5466.cm

12/11/2023 13:27:05;E;5466.cm;user=xxxx group=yyy project=_pbs_project_default jobname=gm queue=q16cpus ctime=1702322495 qtime=1702322495 etime=1702322495 start=1702322495 exec_host=cn09/0*8+cn09/1*8 exec_vnode=(cn09:ncpus=8:mem=2097152kb)+(cn09:ncpus=8:mem=2097152kb) Resource_List.mem=4gb Resource_List.mpiprocs=16 Resource_List.ncpus=16 Resource_List.nodect=2 Resource_List.place=free Resource_List.pmem=768mb Resource_List.select=2:ncpus=8:mpiprocs=8 Resource_List.walltime=05:30:00 session=1024808 end=1702322825 Exit_status=0 resources_used.cpupercent=99 resources_used.cput=00:05:28 resources_used.mem=322884kb resources_used.ncpus=16 resources_used.vmem=582672kb resources_used.walltime=00:05:28 run_count=1

Job monitoring tools at Metis

We provide extensions of the standard PBS monitoring tools, both for running and completed jobs.
The output allows the construction of efficient resource request directives
#PBS -l select=Nchunks:ncpus=Ncpus:mpiprocs=Np:mem=XXXgb
Each job depends on three critical numbers: the amount of reserved memory Ur = XXXgb per the requested chunk of resources, the total number of running processes Np per requested chunk, and the amount of memory Up<Ur/Np (gb) each process can consume during a job run. The first two numbers the user should provide in the select directive are based on the estimation of the Up amount. For example, if you know that one process consumes Up gigabytes of memory and want to run Np such processes, you should reserve at least Ur=Np x Up gigabytes per chunk. But if you can not estimate the Upamount, you can measure it using the tools described below.

jobstat

jobstat

Syntax: jobstat [options]
Use this command to obtain information about running jobs. For each running job the jobstat command shows the output as below, including the average Load of the requested CPUs, the ratio and current values of used and requested memory (uM/rM(%), uM(gb), rM(gb)), and the percentage of the wallitime used uWTM:

metis% jobstat

Job User Account Queue [S] Load(%) uM/rM(%) uWTM(%) nCHNKs nCPUs uM(gb) rM(gb) Start Duration Nodes ---- -------- -------- ------- --------------------------------------------------------------- 5461 z1591116 climlab long [R] 100.31 64.26 23.91 8 1024 329 512 -05:44:18 24:00:00 8 nodes

jmtanl

jmtaln

Syntax: jmtanl search_pattern [options]
Searches in qhist output for jobs matching with the search_pattern in their records. For each matched job the jmtanl command shows the output as below, including the ratio and measured values of used and requested memory (uM/rM(%), uM(gb), rM(gb)), and the job exit status:

metis% jmtanl 5190.cm

Job User Group CPUs Nodes Chunks uRAM(gb) rRAM(gb) u/r(%) WallTime-EndTime Exit --- ---- ----- ---- ----- ------ -------- ------- ------ ------------------------- ---- 5190.cm zxxxxxxx mdmech15 80 5 5 3.92 32.00 12.25 29:47:15-12/01/23-05:37:19 0

jmanl

jmaln zxxxxxxx

Syntax: jmanl search_pattern [options]
Searches for completed jobs matching with the search_pattern in their records. For each matched job the jmtanl command shows the output as below, including the ratio and measured values of used and requested memory per chunk (uM/rM(%), uM(gb), rM(gb)):

metis% jmtanl 5190.cm

Completed jobs last week Job 5460.cm-z1591116-climlab (32 CPUs, 1 node(s), 1 chunk(s)): memory per chunk::Used=66.20 gb, Reserved=128.00 gb; U/R=51.71%

Jobs optimization and control

Multichunk jobs need careful construction to help the batch job system run efficiently. When submitting such jobs:

ensure the requested memory and the numbers of CPUs and GPUs are optimal for the task. We recommend running several short ~1-hour jobs, increasing the number of requested processors until the execution time reaches a minimum. For example, one can have a task that will run 1 hour on 10 CPUs, 30 min on 20 CPUs, 20 min on 40, and 15 min on 80 CPUs. Such results tell that ~40-CPU jobs are optimal for this workflow. Of course, in the case of perfect scaling, the more CPUs, the better if they are available;
check the load of nodes running your jobs using the "jobstat" command. For a well-balanced MPI job the Load measurement will be close to 100%, indicating that all requested CPUs are busy with tasks. The lower this parameter, the fewer CPUs are in use and the less efficient the usage of requested CPUs;

accurately estimate the time requested for long jobs (#PBS -l walltime=hh:mm:ss). It can dramatically affect your job starting time and will allow a better schedule of maintenance tasks. This can be done as follows:
- estimate the fraction of events, records, time steps, iterations, etc., your application can process during a ~15-minute test job.
- extrapolate to find the time needed to process an entire dataset. For example, if the measured time to process 100 records is 100 sec, then we can expect that 1000 records will be processed in 1000 sec. Multiply the result by a factor of two to cover the positive uncertainty.
- If the result exceeds 24-48 hours, think about how to split the job - running several short jobs can decrease the waiting time in the PBS queue.

Job Exit Status

The exit status of a job may fall in one of three ranges, listed in the following table:

Table1: Job Exit Status Ranges

Range	Reason	Description
X=0	This is the exit value of the top process	Job execution was successful The exit status of an interactive job is always recorded as 0 (zero), regardless of the actual exit status.
X<0	The job could not be executed	See Negative Exit Codes
0 < X < 128	This is the exit value of the top process	This is the exit value of the top process in the job, typically the shell. This may be the exit value of the last command executed in the shell or the .logout script if the user has such a script (csh).
X >=128	Job was killed with a signal	This means the job was killed with a signal. The signal is given by X modulo 128 (or 256). For example, an exit value of 137 means the job's top process was killed with signal 9 (137 % 128 = 9). The exit status values greater than 128 (or 256) indicate which signal killed the job. Depending on the system, values greater than 128 (or on some systems 256; see wait(2) or waitpid(2) for more information), are the value of the signal that killed the job. To interpret (or "decode") the signal contained in the exit status<value, subtract the base value from the exit status. For example, if a job had an exit status of 143, that indicates the job was killed via a SIGTERM (e.g. 143 - 128 = 15, signal 15 is SIGTERM). See the kill(1) manual page for a mapping of signal numbers to signal name on your operating system.

The exit status of jobs is recorded in the PBS server logs and the accounting logs.

Negative exit codes
Negative exit status indicates that the job could not be executed:

-1, JOB_EXEC_FAIL1, Job execution failed, before files, no retry
-2, JOB_EXEC_FAIL2, Job execution failed; after files, no retry
-3, JOB_EXEC_RETRY, Job execution failed, do retry
-4, JOB_EXEC_INITABT, Job aborted on MoM initialization
-5, JOB_EXEC_INITRST, Job aborted on MoM initialization, checkpoint, no migrate
-6, JOB_EXEC_INITRMG, Job aborted on MoM initialization, checkpoint, ok migrate
-7, JOB_EXEC_BADRESRT, Job restart failed
-10, JOB_EXEC_FAILUID, Invalid UID/GID for job
-11, JOB_EXEC_RERUN, Job was rerun
-12, JOB_EXEC_CHKP, Job was checkpointed and killed
-13, JOB_EXEC_FAIL_PASSWORD, Job failed due to a bad password
-14, JOB_EXEC_RERUN_ON_SIS_FAIL, Job was requeued (if rerunnable)
or deleted (if not) due to a communication failure between the primary execution host MoM and a Sister
-15, JOB_EXEC_QUERST, Requeue job for restart from checkpoint
-16, JOB_EXEC_FAILHOOK_RERUN, Job execution failed due to hook rejection; requeue for later retry
-19, JOB_EXEC_HOOK_DELETE, A hook requested for the job to be deleted
20JOB_EXEC_RERUN_MS_FAIL, Job requeued because the server couldn't contact the primary execution host MoM

Documentation

Resource Monitors

Help
- CRCD HelpDesk

Prospective user?

Request an account.