|
HowTo /
Run Jobs On FNALU BatchA small batch system managed by a Condor queueing system exists on the FNALU cluster. NOvA collaborators are welcome to use this queue for running jobs. Jobs must be submitted to the queue from flxi09.fnal.gov. The Condor system does not forward tokens when jobs are submitted to the queue. That means if you intend to run a job that accesses afs space you need to submit the job under kcron. Before doing so, you must do a % kcroninit on each machine in the queue. Those machines are flxi09.fnal.gov flxi10.fnal.gov flxb31.fnal.gov flxb32.fnal.gov flxb33.fnal.gov flxb34.fnal.gov flxb36.fnal.gov Once you have successfully completed that step, you are ready to put together a command file that will tell Condor how to run your job. An example command file is below. The comments in parentheses indicate the meaning of each line. Remove the comments before putting these entries into a command file. UNIVERSE = vanilla (type of environment in which jobs will run, vanilla is the default)
WHEN_TO_TRANSFER_OUTPUT = ON_EXIT (when to transfer the output of the log, output and error files)
TRANSFER_OUTPUT = true (flag to turn on/off transfer of output)
TRANSFER_ERROR = true (flag to turn on/off transfer of error)
TRANSFER_EXECUTABLE = false (flag to turn on/off transfer of executable)
NOTIFICATION = Always (flag to turn on/off email notification of job completion)
notify_user = user@fnal.gov (email address for notifications - change to your address)
REQUIREMENTS = FNALUBatchServer (use the FNALU server)
REMOTE_INITIALDIR = /path/to/your/dir (directory where the jobs will start)
EXECUTABLE = /usr/krb5/bin/kcron (executable to run - must be kcron to run using afs files)
ARGUMENTS = script arg1 arg2 arg3 (list of arguments to executable, script is the job you
want to run, it may take arguments arg1, arg2, and arg3)
LOG = log.arg1.arg2.arg3 (name of log file)
OUTPUT = out.arg1.arg2.arg3 (name of output file)
ERROR = err.arg1.arg2.arg3 (name of error file)
QUEUE
Assuming that you have put these lines into a file called "cmdfile" you submit your job as % condor_submit cmdfile Condor provides several commands for interacting with the queue. The most useful commands are: % condor_q (provides a list of all jobs in the queue detailing owner, executable, time spent running, and state) % condor_rm XXX (remove job XXX from the queue) |