The High Performance Compute Cluster uses several different software packages in order to monitor and maintain its processes. This page will explain the common commands and methods used to interact with the HPCC and running jobs.

How Jobs on the HPCC Work

To run a job on the HPCC, you must create and submit a shell script file. The file contains all information needed to control and manage the processes you will be running. Go to this website to learn about the format and commands contained in the script file. The page for the software you are using will contain commands specific to that software.

Submitting a Job

Once you have created a script file (using the website linked here), you must submit that file to the job manager on the HPCC. To do that, we use the qsub command:

qsub <script_file_location>

For instance, if I had a script named in the folder testJob on my H: drive, I would type:

qsub ~/testJob/testScript.s

Viewing Jobs

The command to view jobs currently running on the HPCC is showq. This command shows all currently running jobs. Each job will have its own line of output. To only view jobs that you have created, use the -u flag. This flag takes your username as an argument, like this:

showq -u <AU_ID>

Important columns in showq's output are:

JOBNAME - this column shows the process ID of each job. This is how you will identify the job when using other commands to manipulate it.
USERNAME - shows the user that created the job. You are only allowed to manipulate jobs you have created.
PROC - this column shows the number of processors being currently used by the job.

IDLE JOBS are jobs that have been paused for some reason. To unpause the job, use the runjob command:

runjob <job_ID>

BLOCKED JOBS are jobs that are currently blocked due to resource restrictions. This will happen if your jobs requires more processors than the HPCC currently has available. Once enough processors are free, your job will start.

With Idle or Blocked Jobs, you can view when the job is scheduled to start using the showstart command:

showstart <job_ID>

Viewing Job Specifics

To view the resource usage specific to your job, use the checkjob command with your job's unique ID:

checkjob <job_ID>

The -v flag asks the command to supply verbose output, telling you more about processor usage over the lifetime of your job:

checkjob -v <job_ID

Canceling Jobs

To cancel your job, use the canceljob command:

canceljob <job_ID>

If you cancel a job and resubmit it, the job will go to the end of the queue and will have to wait for processors to be reallocated, so be careful when using this command.