TWiki> Main Web>HowToUseCondor (31 Jul 2008, brodbd)EditAttach

How do I use it? A quick Condor tutorial

How do I set up a job?

Creating a submit description file

You need to create a submit description file (sometimes referred to as a submit script) telling Condor how to run your program. For example, let's say we have a program called CGI foobar that accepts input on stdin, produces output on stdout, and accepts a few command line arguments. To run this program normally, you might do something like this: CGI foobar -a -n <foobar.in >foobar.out

Here's a sample Condor submit file (let's call it CGI foobar.cmd) that does the same thing:

Executable = foobar
Universe   = vanilla
getenv     = true
input      = foobar.in
output     = foobar.out
error      = foobar.error
Log        = /tmp/brodbd/foobar.log
arguments  = "-a -n"
transfer_executable = false
Queue

A few of these lines require explanation.

  • The Executable line tells Condor what program we want to run.
    • The default path is the current directory. If the executable is somewhere else, you need to supply the full path -- Condor will not search for it the way the shell does.
    • If you need to know the full path to a program that's in your default path, use the CGI which command at a shell prompt. For example: CGI which lexparser.csh
  • Universe = vanilla indicates that this is an ordinary program that does not support checkpointing. Other possibilities include CGI standard, for programs that are linked with the Condor libraries and support checkpointing and restarting; and CGI java, for running Java programs directly. See the Condor manual for more information about these universes. The CGI PVM universe is not currently supported, but see the PVMOnPatas Wiki page for information on how to run PVM directly.
  • getenv = true transfers all the environment variables that are set in the submitter's shell. This is what you want most of the time; much of our software depends on environment variables to locate binaries and libraries.
  • Log indicates where the Condor log file for this job should go. Condor complains if this is located on an NFS filesystem, so putting it in a subdirectory of /tmp is a good idea. Input, output, and error files can go to your home directory.
  • transfer_executable = false tells Condor it does not need to copy the executable file to the compute node. This is usually the case, since the cluster nodes share a common filesystem.

CGI /condor/examples contains some sample jobs. You may want to examine some of the submit description files there to get a better feel for how this works in different situations.

Note: If your email address is not of the form username@u.washington.edu, or if your cluster login and your University netid don't match, you should add a notify_user line to the submit description file to tell condor where to send mail. For example:

notify_user = jdoe@example.com

Submitting the job

Now that you have a description file, submitting it is as simple as: CGI condor_submit foobar.cmd

The job will be queued and run on the first available machine. You will receive an email message when it completes, either at your UW address or at the one you specified in the CGI notify_user line in the submit file.

Managing jobs

The easiest way to track the progress of your job is to check its logfile. The following commands are also helpful:
  • CGI condor_status lists available nodes and their status.
  • CGI condor_q lists the job queue.
  • CGI condor_hold and CGI condor_rm put a job on hold and delete it from the queue, respectively.
All of these commands have manual pages that may be displayed with the CGI man command.

Additionally, CondorView provides status graphs, updated every 15 minutes.

Advanced options

It's possible to submit multiple jobs with one submit file, using multiple Queue lines. Each submission can have different parameters. See CGI /condor/examples/loop.cmd for a good, well-documented example of this.

Multiple submissions can also be automated; for example, if we wanted to run the above job three times, with input files named "foobar.in0" through "foobar.in2", we could do the following:

Executable = foobar
Universe   = vanilla
getenv     = true
input      = foobar.in$(Process)
output     = foobar.out$(Process)
error      = foobar.error$(Process)
Log        = /tmp/brodbd/foobar.log
arguments  = "-a -n"
Queue 3

CGI $(Process) is a variable substitution; it will be replaced by the process number of each process that's queued. Consult the condor_submit manpage (CGI man condor_submit) for more details.

Things to keep in mind

Because the job will actually be run on a compute node, not on the system you're logged into, it's important to make sure that it will be able to access all the files it needs. Home directories, /opt, /projects, /NLP_TOOLS, and /corpora are shared; however, /tmp is not. Make sure everything your job needs is located on one of the shared filesystems.

If you want to put input, output, or error files on a non-shared filesystem such as /tmp, you can add CGI stream_input=true, CGI stream_output=true, and/or CGI stream_error=true to your submit file. This tells Condor to pipe the output back to the original submitting system instead of creating the file on the node.

To keep the cluster responsive, long-running processes run on patas itself will automatically have their CPU priority lowered. Additionally, processes on patas itself are limited to no more than 2 GB of RAM. Processes submitted to Condor are not affected by this, so you should try to use Condor for anything CPU-intensive.

-- brodbd - 21 Feb 2008

Topic revision: r18 - 31 Jul 2008 - 23:21:56 - brodbd
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback