It is strongly advised that all heavy and time-consuming program executions be executed on Metis compute nodes. Any such program executed on the login node will be terminated.
Metis uses a resource manager (PBS) to control batch jobs and available resources for distributed high-performance computing. The following are the most basic and important tools that users can use to manage their jobs:
qsub - submit a pbs job
Syntax:qsub [options] pbs_script_file
This creates and submits a pbs job in batch mode, wherein the task specified in the script file is executed in the background, using the resources specified in the script file or the options at time of job submission. The resource manager assigns an identification number (JobID) for the job, which the user can use to monitor and manage the job itself. For example:
user@metis qsub runmyjob.pbs
10125.cm
user@metis
The JobID assigned to the '
runmyjob.pbs
' job is the leading number in the second line, which is 10125. Because the job is running in the background, all program output to standard out will be written to a log file. For the above example, the log file will be named 'runmyjob.pbs.o10125
'. Messages printed to standard error can also appear in the same log file, or in a separate file named 'runmyjob.pbs.e10125'.
qstat - show status of pbs batch jobs
Syntax:qstat [options] [JobID]
Use this command to request the status of jobs, queues or batch servers. If JobID is not specified, a list of all the jobs currently running or waiting on the batch server will be printed. Job status can be checked under the 'S' column using the following legend:
R job is running
Q job is in queue, will start when resources are available
C job is finished
metis% qstat
Job ID |
Name |
User |
Time Use |
S |
Queue |
---|---|---|---|---|---|
------------------------- |
---------------- |
--------------- |
-------- |
- |
----- |
6954.cm |
6222_multipass |
z1690179 |
22:52:55 |
R |
q64cpus |
6957[].cm |
magda.pbs |
z1756868 |
0 |
R |
long |
6960.cm |
6282_5nm |
z1690179 |
0 |
Q |
long |
6974.cm |
TiTiN_50 |
z1761164 |
0 |
Q |
long |
6977.cm |
7Ni3WCNTmttm |
z1791832 |
191:12:3 |
R |
q16cpus |
6986.cm |
18Ni3WCNTmttm |
z1791832 |
0 |
Q |
long |
6987.cm |
c10_im |
rarichardson |
65:51:47 |
R |
long |
6988.cm |
PHAD_PIC_GPU |
hermans |
0 |
Q |
long |
metis%
qdel
- delete pbs batch job
Syntax:qdel [options] JobID
Use this command to delete a job on the server. If the job is still running, it will be terminated; otherwise, it will be removed from the queue.
Consult the main pages for qsub, qstat, and qdel for more details.
qhist
- show the PBS records of the completed jobs
Syntax:qhist [options]
Use this command to obtain historical job records on the server. Specifically, to obtain the detailed information about a completed job
JobID
one can run"qhist -D year --raw | grep JobID"
:
metis% qhist -D year --raw | grep 5466.cm
12/11/2023 13:27:05;E;5466.cm;user=xxxx group=yyy project=_pbs_project_default jobname=gm queue=q16cpus ctime=1702322495 qtime=1702322495 etime=1702322495 start=1702322495 exec_host=cn09/0*8+cn09/1*8 exec_vnode=(cn09:ncpus=8:mem=2097152kb)+(cn09:ncpus=8:mem=2097152kb) Resource_List.mem=4gb Resource_List.mpiprocs=16 Resource_List.ncpus=16 Resource_List.nodect=2 Resource_List.place=free Resource_List.pmem=768mb Resource_List.select=2:ncpus=8:mpiprocs=8 Resource_List.walltime=05:30:00 session=1024808 end=1702322825 Exit_status=0 resources_used.cpupercent=99 resources_used.cput=00:05:28 resources_used.mem=322884kb resources_used.ncpus=16 resources_used.vmem=582672kb resources_used.walltime=00:05:28 run_count=1
We provide extensions of the standard PBS monitoring tools, both for running and completed jobs.
The output allows the construction of efficient resource request directives#PBS -l select=Nchunks:ncpus=Ncpus:mpiprocs=Np:mem=XXXgb
Each job depends on three critical numbers: the amount of reserved memory Ur = XXXgb
per the requested chunk of resources, the total number of running processes Np
per requested chunk, and the amount of memory Up<Ur/Np
(gb) each process can consume during a job run. The first two numbers the user should provide in the select directive are based on the estimation of the Up
amount. For example, if you know that one process consumes Up
gigabytes of memory and want to run Np
such processes, you should reserve at least Ur=Np x Up
gigabytes per chunk. But if you can not estimate the Up
amount, you can measure it using the tools described below.
jobstat
Syntax:jobstat [options]
Use this command to obtain information about running jobs. For each running job the
jobstat
command shows the output as below, including the averageLoad
of the requested CPUs, the ratio and current values of used and requested memory(uM/rM(%), uM(gb), rM(gb))
, and the percentage of the wallitime useduWTM
:
metis% jobstat
Job User Account Queue [S] Load(%) uM/rM(%) uWTM(%) nCHNKs nCPUs uM(gb) rM(gb) Start Duration Nodes
---- -------- -------- ------- ---------------------------------------------------------------
5461 z1591116 climlab long [R] 100.31 64.26 23.91 8 1024 329 512 -05:44:18 24:00:00 8 nodes
jmtaln
Syntax:jmtanl search_pattern [options]
Searches in
qhist
output for jobs matching with thesearch_pattern
in their records. For each matched job thejmtanl
command shows the output as below, including the ratio and measured values of used and requested memory(uM/rM(%), uM(gb), rM(gb))
, and the job exit status:
metis% jmtanl 5190.cm
Job User Group CPUs Nodes Chunks uRAM(gb) rRAM(gb) u/r(%) WallTime-EndTime Exit
--- ---- ----- ---- ----- ------ -------- ------- ------ ------------------------- ----
5190.cm zxxxxxxx mdmech15 80 5 5 3.92 32.00 12.25 29:47:15-12/01/23-05:37:19 0
jmaln zxxxxxxx
Syntax:jmanl search_pattern [options]
Searches for completed jobs matching with the
search_pattern
in their records. For each matched job thejmtanl
command shows the output as below, including the ratio and measured values of used and requested memory per chunk(uM/rM(%), uM(gb), rM(gb))
:
metis% jmtanl 5190.cm
Completed jobs last week
Job 5460.cm-z1591116-climlab (32 CPUs, 1 node(s), 1 chunk(s)): memory per chunk::Used=66.20 gb, Reserved=128.00 gb; U/R=51.71%
Multichunk jobs need careful construction to help the batch job system run efficiently. When submitting such jobs:
ensure the requested memory and the numbers of CPUs and GPUs are optimal for the task. We recommend running several short ~1-hour jobs, increasing the number of requested processors until the execution time reaches a minimum. For example, one can have a task that will run 1 hour on 10 CPUs, 30 min on 20 CPUs, 20 min on 40, and 15 min on 80 CPUs. Such results tell that ~40-CPU jobs are optimal for this workflow. Of course, in the case of perfect scaling, the more CPUs, the better if they are available;
check the load of nodes running your jobs using the "jobstat"
command. For a well-balanced MPI job the Load
measurement will be close to 100%, indicating that all requested CPUs are busy with tasks. The lower this parameter, the fewer CPUs are in use and the less efficient the usage of requested CPUs;
#PBS -l walltime=hh:mm:ss)
. It can dramatically affect your job starting time and will allow a better schedule of maintenance tasks. This can be done as follows: