Once your job has been submitted to PBS via the qsub command, multiple tools are available to monitor and control your job.
1 – Getting general job information
PBS native tools
The qstat command gives information about the current jobs queued or running on MeSU computing servers (more information with man qstat).
Without arguments, qstat will display the current jobs for all users:
user1@mesu2:~> qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 293682.mesu2 run.sh user1 0 Q f96c_b 294059.mesu2 colmlge user2 190:16:0 R f48c_a 294060.mesu2 ibdmlge user2 187:28:4 R f48c_a
If you know you job identifier (output of the qsub command, or by identifying it the qstat output), you can get information about your specific job with the qstat -f command (use qstat -fx if your job has already ended):
user1@mesu2:~> qstat -fx 294059.mesu2 Job Id: 294059.mesu2 Job_Name = job Job_Owner = user1@mesu1.ib0.xa.dsi.upmc.fr resources_used.cpupercent = 949 resources_used.cput = 190:24:04 resources_used.mem = 147434480kb resources_used.ncpus = 24 resources_used.vmem = 164753884kb resources_used.walltime = 47:13:50 job_state = R queue = f48c_a server = mesu2
If you want to display detailed status of job arrays, use qstat -tr.
MeSU Specific tools
In addition to qstat, MeSU offers additional tools that can help you monitor your jobs.
These tools provide a similar user experience to the tools provided by Slurm (a resource management system like PBS) which is available on many supercomputers.
The qqueue tool displays a summary of the jobs running and pending currently on MeSU :
user1@mesu2:~> qqueue JOBID PARTITION NAME USER ST TIME NODES CPUS NODELIST(REASON) 150794 beta m0p50 user4 R 1:17:00 8 192 r1i0n[0-7] 150850 beta GA8_v user1 R 2-12:10:59 4 96 r1i0n[22-25] 151044 beta V3E user1 PD 4 96 (Resources)
The qinfo tool displays a summary of MeSU resources status :
user1@mesu2:~> qinfo PARTITION STATE NODES NODELIST beta alloc 59 r1i0n[0-7,13-14,18-21,26-35],r1i3n[0-17,35],r1i2n[18-20,27-35],r1i1n[13-16] gamma mixed 1 mesu3 beta idle 85 r1i1n[0-12,17-35],r1i2n[0-17,21-26],r1i3n[18-34],r1i0n[8-12,15-17,22-25] gamma idle 1 mesu4
2 – Deleting or stopping a job
At any time, you can delete (or stop) a queued (or running) job by using the qdel command :
user1@mesu2:~> qdel 298109.mesu2
If you’re job is running, PBS will handle the killing of its processes.
3 – Connecting to a computing node for more details
Once your job is running (status R in qstat or qqueue), you can connect to one of the used nodes with ssh in order to inspect the status of the node for instance, or to dynamically interact with your job. Identify the nodes your job has been dispatched to with qqueue or qstat -f and run one of the following commands depending on the server your job has been dispatched to :
# To connect to the node 17 of MeSU-beta ssh r1i0n17
Once connected to a node, you can run the same “classical” system monitoring tools (top, pstree, watch… ) that you would on a desktop computer.