Tips on writing fast, efficient jobs. Feel free to add to this page if you have more good ideas!
These are not condor-specific, but apply to any kind of high-performance computing job. Most of these were originally suggested by Brian High in a post to the UW techsupport list.
- Keep I/O to a minimum - try to read and write each piece of data only once.
- Don't unnecessarily load a bunch of data into RAM. Process it in small chunks, just tracking the minimum amount of information (statistics, state, etc.) in memory.
- Identify what processing can be done independently. Parallelize that. Divide the work up as evenly as possible.
- Precompile regular expressions. This can be a huge speedup in Perl and Python, especially if you're doing regexp matching in a loop. (Brian posted one simple code snippet that ran 25% faster with precompiled regexps.)
- If your task requires doing data lookups, an SQL database will probably be faster than a flat text file. (We have an SQL server available in our cluster; contact linghelp@u for details on getting a database set up.)
- If you can't avoid loading lots of data into RAM, make sure you tell Condor what your job's memory requirements are so it can match it with a large enough machine. A job running on a machine that's too small will run very inefficiently and may not finish at all.
- Each new job that's queued takes about 30 seconds to be matched with a machine. Jobs that complete in less than a minute are likely to spend nearly as much time in the queue as they spend actually running; consider refactoring them to do more work in each job.
- For the same amount of data, writing one large file is faster than writing a bunch of small ones. The bookkeeping required for creating new files creates a performance penalty.
- See the PerformanceProblems page for specific things to avoid.
Topic revision: r1 - 2010-06-29 - 17:11:00 - brodbd