Once your job has been submitted to PBS via the qsub command, multiple tools are available to monitor and control your job.

1 – Getting general job information

PBS native tools

The qstat command gives information about the current jobs queued or running on MeSU computing servers (more information with man qstat).

Without arguments, qstat will display the current jobs for all users:

user1@mesu2:~> qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
293682.mesu2      run.sh           user1                0 Q f96c_b          
294059.mesu2      colmlge          user2         190:16:0 R f48c_a          
294060.mesu2      ibdmlge          user2         187:28:4 R f48c_a

If you know you job identifier (output of the qsub command, or by identifying it the qstat output), you can get information about your specific job with the qstat -f command (use qstat -fx if your job has already ended):

user1@mesu2:~> qstat -fx 294059.mesu2
Job Id: 294059.mesu2
    Job_Name = job
    Job_Owner = user1@mesu1.ib0.xa.dsi.upmc.fr
    resources_used.cpupercent = 949
    resources_used.cput = 190:24:04
    resources_used.mem = 147434480kb
    resources_used.ncpus = 24
    resources_used.vmem = 164753884kb
    resources_used.walltime = 47:13:50
    job_state = R
    queue = f48c_a
    server = mesu2

If you want to display detailed status of job arrays, use qstat -tr.

MeSU Specific tools

In addition to qstat, MeSU offers additional tools that can help you monitor your jobs.
These tools provide a similar user experience to the tools provided by Slurm (a resource management system like PBS) which is available on many supercomputers.

The qqueue tool displays a summary of the jobs running and pending currently on MeSU :

user1@mesu2:~> qqueue
JOBID      PARTITION  NAME      USER   ST    TIME     NODES  CPUS  NODELIST(REASON)
150794     beta      m0p50     user4    R    1:17:00     8     192  r1i0n[0-7]
150850     beta      GA8_v     user1    R  2-12:10:59    4     96   r1i0n[22-25]
151044     beta      V3E       user1    PD               4     96  (Resources)

The qinfo tool displays a summary of MeSU resources status :

user1@mesu2:~> qinfo
PARTITION  STATE     NODES  NODELIST
beta       alloc       59   r1i0n[0-7,13-14,18-21,26-35],r1i3n[0-17,35],r1i2n[18-20,27-35],r1i1n[13-16]
gamma      mixed       1    mesu3
beta       idle        85   r1i1n[0-12,17-35],r1i2n[0-17,21-26],r1i3n[18-34],r1i0n[8-12,15-17,22-25]
gamma      idle        1    mesu4

2 – Deleting or stopping a job

At any time, you can delete (or stop) a queued (or running) job by using the qdel command :

user1@mesu2:~> qdel 298109.mesu2

If you’re job is running, PBS will handle the killing of its processes.

3 – Connecting to a computing node for more details

Once your job is running (status R in qstat or qqueue), you can connect to one of the used nodes with ssh in order to inspect the status of the node for instance, or to dynamically interact with your job. Identify the nodes your job has been dispatched to with qqueue or qstat -f and run one of the following commands depending on the server your job has been dispatched to :

# To connect to the node 17 of MeSU-beta
ssh r1i0n17

Once connected to a node, you can run the same “classical” system monitoring tools (top, pstree, watch… ) that you would on a desktop computer.