How do I use it? A quick Condor tutorial
How do I set up a job?
Creating a submit description file
You need to create a submit description file
(sometimes referred to as a submit script
) telling Condor how to run your program. For example, let's say we have a program called
that accepts input on stdin, produces output on stdout, and accepts a few command line arguments. To run this program normally, you might do something like this:
foobar -a -n <foobar.in >foobar.out
Here's a sample Condor submit file (let's call it
) that does the same thing:
executable = foobar
getenv = true
input = foobar.in
output = foobar.out
error = foobar.error
log = foobar.log
notification = complete
arguments = "-a -n"
transfer_executable = false
request_memory = 2*1024
A few of these lines require explanation.
- The executable line tells Condor what program we want to run.
- The default path is the current directory. If the executable is somewhere else, you need to supply the full path -- Condor will not search for it the way the shell does.
- If you need to know the full path to a program that's in your default path, use the
which command at a shell prompt. For example:
- getenv = true transfers all the environment variables that are set in the submitter's shell. This is what you want most of the time; much of our software depends on environment variables to locate binaries and libraries.
- log indicates where the Condor log file for this job should go.
- notification = complete causes Condor to send you email when the job completes. Other valid options include always, error, and never.
- If your email address is not of the form firstname.lastname@example.org, or if your cluster login and your University netid don't match, you should add a notify_user line to the submit description file to tell condor where to send mail.
- transfer_executable = false tells Condor it does not need to copy the executable file to the compute node. This is usually the case, since the cluster nodes share a common filesystem.
- request_memory = 2*1024 tells Condor this job wants 2 GB (2048 MB) of RAM. If you leave out the request_memory line, the default is 1024 MB. Note that if you over-estimate, you limit the number of machines your job can run on, but if you under-estimate and the job outgrows its memory request, Condor may kill it. The
SIZE column in the output of the
condor_q command shows the current memory usage in megabytes of a running job.
contains some sample jobs. You may want to examine some of the submit description files there to get a better feel for how this works in different situations.
Submitting the job
Now that you have a description file, submitting it is as simple as:
The job will be queued and run on the first available machine. You will receive an email message when it completes, either at your UW address or at the one you specified in the
line in the submit file.
The easiest way to track the progress of your job is to check its logfile. The following commands are also helpful:
condor_status lists available nodes and their status.
condor_q lists the job queue.
condor_rm deletes a job from the queue.
These commands normally only operate on jobs that have been submitted from the same machine they're run from. condor_q supports a
switch to see all jobs.
All of these commands have manual pages that may be displayed with the
It's possible to submit multiple jobs with one submit file, using multiple Queue lines. Each submission can have different parameters. See
for a good, well-documented example of this.
Multiple submissions can also be automated; for example, if we wanted to run the above job three times, with input files named "foobar.in0" through "foobar.in2", we could do the following:
Executable = foobar
getenv = true
input = foobar.in$(Process)
output = foobar.out$(Process)
error = foobar.error$(Process)
Log = /tmp/brodbd/foobar.log
arguments = "-a -n"
is a variable substitution; it will be replaced by the process number of each process that's queued. Consult the condor_submit manpage (
) for more details.
Helping us track research usage
If your job is research-related, please add the following to your submit description file, above the queue line:
+Research = True
This helps us track research vs. non-research jobs on our cluster and potentially qualify for certain tax exemptions. It does not
affect job scheduling in any way.
If you have a very large queue of jobs to run, but don't care if they finish quickly, you can add the following to your submit file as a courtesy to other users:
nice_user = true
This tells condor to let other jobs jump ahead of yours in the queue, when a new slot is available; in other words, processes in your job will only start on slots that no other jobs want.
Things to keep in mind
Because the job will actually be run on a compute node, not on the system you're logged into, it's important to make sure that it will be able to access all the files it needs. Home directories, /opt, /projects, /NLP_TOOLS, and /corpora are shared; however, /tmp is not. Make sure everything your job needs is located on one of the shared filesystems.
For reasons having to do with UW IT's use of Kerberos authentication, condor can not
If you want to put input, output, or error files on a non-shared filesystem such as /tmp, you can add
to your submit file. This tells Condor to pipe the output back to the original submitting system instead of creating the file on the node. It may add a slight performance penalty of you're doing a lot of I/O.
To keep the cluster responsive, long-running processes run on patas itself will automatically have their CPU priority lowered. Additionally, processes on patas itself are limited to no more than 2 GB of RAM. Processes submitted to Condor are not affected by this, so you should try to use Condor for anything CPU-intensive.