Poor performance due to unnecessary I/O
It's very common for parallel jobs on our system to bottleneck on disk I/O. This is because a single network fileserver is shared by all the nodes. Sometimes this can't be avoided; however, it's worth keeping this in mind so you don't generate unnecessary I/O. In particular, excessive amounts of debugging info on stdout or stderr can create a lot of disk activity as Condor copies this data to your output files. (In one particular case I saw, a job was generating 2 GB of stderr output per run, for only a few hundred megabytes of useful output.)
If you use "debugging by printf()" to sort out problems with your code, remember to comment out those lines when you do large parallel runs.
Topic revision: r1 - 2010-05-26 - 18:13:05 - brodbd