Article Index

You probably will also want to place in your template script options for notifying the user by e-mail when something happens with your job, such as start of execution, abort, or completion. You can configure the list of e-mail addresses used, and the events that result in a mail message being generated. The sample scripts mail a single user at the beginning and end of job execution. For PBS, one directive sets up the e-mail address list (line 9), and a second directive sets all notification options (line 11). The be stands for beginning and end; you can use b,e, or a (abort), or any combination of the three. LSF uses separate directives for each type of notify (see lines 8,9, and 10); -u sets the user e-mail list, -B sends mail at the beginning of the job, and -N notifies you upon completion. The completion report will also contain some statistics about the job, such as how long it actually executed.

Finally, once you've finished all your directives, your script needs to actually run your job. You can simply place the commands in the script that would use to normally run the job from the command line. You can also run any other sequence of shell commands and the output will be captured in your standard output file. Using the echo command to place some labels in your file is typically helpful here. In this case, in line 14 of Listing One and line 12 of Listing Two, the directory the command was run from will be placed in the file (note the different environment variables for PBS vs. LSF).

 1  #!/bin/csh
 2  #BSUB -n 32
 3  #BSUB -J samplejob
 4  #BSUB -o %J.out
 5  #BSUB -e %J.err
 6  #BSUB -q workq
 7  #BSUB -W 0:15
 8  #BSUB -u This email address is being protected from spambots. You need JavaScript enabled to view it.
 9  #BSUB -B
10  #BSUB -N
11  echo "Job Starting"
12  echo "Submitted from Directory: $LS_SUBCWD"
13  ./my_job my_args

Submitting Scripts

Once you have completed your job script, you are ready to submit it to the resource manager. This task can be done in PBS with the qsub command:

$qsub pbs_sample_script.sh

or, for LSF, with the bsub command:

$bsub lsf_sample_script.sh

All of the options used in the job scripts can also be passed directly on the command line to either the qsub or bsub command. This option shouldn't be used as a substitute for creating job scripts, but it can be useful in certain cases. For instance, if you wanted to vary the number of nodes you ran a job on to measure it's performance, and you didn't want to change your script for each run, you could simply remove the directive about nodes from the script, and submit commands to the queue such as:

 $qsub -l nodes=4 pbs_sample_script.sh  
 $qsub -l nodes=8 pbs_sample_script.sh  

You could even place these commands inside another script (and probably should).

Parting Thoughts

With the distribution of a few sample scripts, you can save your users a lot of time and effort. The scripts here provide a starting point, but you should probably provide a sequence of steadily more sophisticated scripts. The next step would be to add directive to define dependencies; for instance, submitting jobs that won't start until other jobs finish. This feature is particularly useful if you have jobs producing files that are input to other jobs. There are plenty more options, but we're out of space for this month. Don't worry about mastering them all, the simple set provided here we'll get you through a lot of jobs. Happy batching!

Finally, an astute reader pointed out that we missed a resource manger in last issue. SLURM is a production resource manager used and developed at Lawrence Livermore National Labs. It is now more widely available under the GNU public license. Like PBS and LSF, it allows for integration with MAUI and other schedulers. One of the strengths of SLURM is it's ability to tolerate node failures and continue functioning. SLURM is in use on cluster of 1,000 nodes already.

Thanks to Karl Schulz at the Texas Advanced Computing Center, for access to scripts from their production LSF environment.

Sidebar One: Resources
Portable batch System (PBS/Torque)

Load Sharing Facility (LSF)

This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Dan Stanzione is currently the Director of High Performance Computing for the Ira A. Fulton School of Engineering at Arizona State University. He previously held appointments in the Parallel Architecture Research Lab at Clemson University and at the National Science Foundation.

You have no rights to post comments

Search

Login And Newsletter

Create an account to access exclusive content, comment on articles, and receive our newsletters.

Feedburner


This work is licensed under CC BY-NC-SA 4.0

©2005-2023 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.