MeSU Supercomputer uses a queuing system to match users jobs with available computing resources.
Users submit their programs to the job scheduler (PBS), which maintains a queue of jobs and distributes them on the compute nodes according to the servers status, scheduling policies and jobs parameters (number of compute nodes / cores, estimated execution time, required memory, etc.).
The interface with PBS is done via a text file – the PBS script – created by the user, which will define your job requirements and execution steps. This file is mainly comprised of two sections :
- the header in which you specify the job requirements (execution time, number of CPU cores to use, memory requirements…) in the form of PBS directives.
- the body in which you will write the commands to load specific software, define environment variables, and run your job.
PBS script: header
The header is a succession of PBS directives which syntax is
#PBS [option]
The most relevant options on MeSU Supercomputer are:
- #PBS -S /bin/bash Shell to use
- #PBS -q beta Queue (or partition) to use, possible values are beta and gamma (beta is the default)
- #PBS -l select=2:ncpus=24 Resource requirements (CPU cores, MPI processes, memory, etc.)
- #PBS -l walltime=03:20:10 Maximal execution time of the job (for example here, 3 hours 20 minutes 10 seconds)
- #PBS -N jobName Name of the job
- #PBS -o stdout.txt Standard output file
- #PBS -e stderr.txt Standard error file
- #PBS -j oe Redirects stderr in stdout
Here is, for instance, a typical header for a MPI job spawning on 2 compute nodes with 24 MPI processes on each node and requesting a maximum memory of 64 GB with a maximum execution time of 20 minutes :
#PBS -S /bin/bash #PBS -N myMPIjob #PBS -o output.txt #PBS -j oe #PBS -q beta #PBS -l walltime=00:20:00 #PBS -l select=2:ncpus=12:mpiprocs=12 #PBS -l mem=64GB
PBS script: script body
After having defined your resources needs in the header, you will have to define the execution steps of your job.
This includes (but is not limited to) :
- Setting-up environment variables
- Copying data files, program executables
- Specifying the output directories
- Specifying the execution command
- Cleaning-up
A PBS script is a shell script, thus you can use the same constructs and commands as in a regular Linux shell script.
Job Environment Variables
Different environment variables are accessible within a PBS job script.
They will allow you to interact with PBS resource allocation, temporary directories and various job information :
- $PBS_O_WORKDIR Directory where the qsub command was executed. Useful with the cd (change directory) command to change your current directory to your working directory.
- $TMPDIR Local temporary disk storage unique to each node and each job. This directory is automatically created at the beginning of the job and deleted at the end of the job
- $USER User Name (NetID). Useful if you would like to dynamically generate a directory on some scratch space.
- $HOSTNAME Name of the computer currently running the script. This should be one of the nodes listed in the file $PBS_NODEFILE.
- $HOST Same as $HOSTNAME.
- $PBS_JOBID Job ID number given to this job. This number is used by many of the job monitoring programs such as qstat, showstart, and dque
- $PBS_JOBNAME Name of the job. This can be set using the -N option in the PBS script (or from the command line). The default job name is the name of the PBS script.
- $PBS_NODEFILE Name of the file that contains a list of the HOSTS provided for the job.
- $PBS_ARRAYID Array ID numbers for jobs submitted with the -t flag. For example a job submitted with #PBS -t 1-8 will run eight identical copies of the shell script. The value of $PBS_ARRAYID will be an integer between 1 and 8.
- $PBS_VNODENUM Used with pbsdsh to determine the task number of each processor. For more information see http://www.ep.ph.bham.ac.uk/general/support/torquepbsdsh.html.
- $PBS_O_PATH Original PBS path. Used with pbsdsh.
- $PBS_NUM_PPN Number of cores requested (per node)
Submit your script/job
Submitting your job to the PBS scheduler is easy once you have authored the corresponding PBS script file, using the qsub command:
qsub [path/to/your/script]
For instance, if you saved your script file as myScript.sh, just run the following command in a terminal:
qsub myScript.sh
Note that you can also specify PBS directives in the command line as qsub options, which will override the ones defined in the script.
For instance, to override the CPU requirements and target queue of a PBS script, run the command:
qsub -q beta -l select=3:ncpus=24:mpiprocs=24 myScript.sh
The common approach to specify your CPU needs is however to specify your requirements in the header of the script file, and to give as an option to qsub the target machine you wish your job to run on :
#PBS -l select=2:ncpus=12:mpiprocs=12
And at the command line:
# submission on beta compute nodes qsub -q beta myScript.sh
On MeSU-beta, all nodes have 24 cores each.
Thus, you should use the first argument of select to request a number of nodes and the ncpus argument to request the number of cores per node
Beware, if you select only one node and set ncpus to a value greater than 24, your job will never start.
select=2:ncpus=16:mpiprocs=16 # Requests 2 nodes providing 16 cores each (32 cores total) select=4:ncpus=24:mpiprocs=24 # Requests 4 nodes providing 24 cores each (96 cores total)