Getting the most out of the compute servers
Sketch of set up
Pongo is the head node in a cluster with 8 other computers. Each
computer has 4GB RAM, but the disk space is mostly on Pongo (the
fileserver). The current cluster management/load balancing software
is Mosix (
http://openmosix.sourceforge.net/), which allows processes to automatically migrate to other
nodes, under certain conditions (see also
http://howto.x-tend.be/openMosixWiki/index.php/don't):
- At least one other process is running on the head node
- The process to migrate has been running for at least a minute
- The process to migrate does not (under the current version of Mosix) used shared memory. For practical purposes, this means most programs written in C or C++ are able to migrate, but Java programs are not. (In addition, Matlab and BLAST won't migrate.)
When a process migrates, this is completely transparent
to the user: the output is still written to the expected place
on Pongo, etc.
How do I make sure my process can migrate if need be?
- Use C or C++ instead of Java
- Make sure any libraries you're using don't use shared memory
- Compile Java code with gcj
There is reason to believe that future releases of
Mosix will be able to migrate code with shared memory,
but we're not there yet.
How can I tell if my process can migrate?
- Run the process and find it's process ID (PID). (The command
ps u will show you all of the processes that belong to you.)
- If the process cannot be migrated, the reason will be listed in a file called
cantmove in the directory /proc/PID/cantmove (where PID is replaced by the actual process ID). Note that this file gets cleaned up when the program exits. Here's a description of the possible reasons:
- clone_vm: the application is using thread
- monkey: the application is using files as shared memory
- daemon: daemon process
- rt_sched: real-time scheduling
- mmap_dev: process is mapping a device
- direct_io: direct I/O permission
- mem_lock: locks memory
How can I tell if my process has migrated?
You can use
mtop to tell whether a program has been migrated.
mtop is an open-mosix-aware version of the standard UNIX utility
top that adds two columns to the output relating to process migration. The
N# column gives the node numner a given process is running on, and the
MGS column tells the number of times it has migrated. All user shell interaction takes place on node 0, so if either of these columns contains a non-zero number, the corresponding process has been migrated. See
man mtop for more details.
How can I get a snapshot of the load on the pongo cluster?
Run
mosmon.
How can I make my process run on multiple machines?
The easy way (which is only applicable to certain types
of tasks) is to write a script (perl script or shell
script) which splits the process into separate processes
and invokes each one. Mosix can then migrate those processes
onto different nodes. For example, if you need to parse
10,000 sentences (stored in one input file), your script
can create 10 input files, invoke the parser for each
file, and then concatenate the results. The splitting and
concatenating processes at the beginning would only run
on one machine each, but the parsing would potentially be
split across all the machines.
What if Mosix migration seems to cause my code to crash?
If your process is crashing mysteriously, especially when you run large
data sets, Mosix migration may be an issue. There have
been reports of mysterious crashing and segmentation faults occurring
as the result of migration of certain Perl and Make scripts, as well as
with some Python programs.
If you believe migration may be causing crashes in your code, you can force your code to lock to a particular
node in the cluster using
mosrun followed by preferred node number as a switch, and
the -L switch, which locks the command to the node. For example, if I wanted
to lock a hypothetical
nlpProcess to node 19, I would type:
mosrun -L -19 nlpProcess
In some cases you may need to lock the process to the head node using
runhome, e.g.:
runhome nlpProcess
runhome is a synonym for
mosrun -L -1.
What if there is no sensible way to divide the task up into N chunks? Can I still make my programs parallel?
Yes, but it takes more effort. The tool to use is MPI
(message passing interface), and to do so, you have to
write it into your code. This kind of programming is
trickier (you have to worry about how the parallelization
works into your algorithm), but in the long run, potentially
a valuable skill to have. The relevant libraries are installed on Pongo.
For more information, see: /usr/share/doc/mpi-doc/ on Pongo
and the tutorials at
http://www.lam-mpi.org/
How Can I Check on the Status of my Processes?
Some useful information
here.
Sharing the sandbox
Policy
In order for everyone to get the most out of our servers,
everyone needs to play nicely, with respect to memory and
cpu time. In addition to asking everyone to be mindful
of being efficient with resources (are you loading the whole
Penn Treebank into RAM?), we have developed the following
policy with regard to long-running processes:
- Any process that has run for over 4 hours of CPU time will automatically trigger an email to the owner of the process.
- If the owner does not reply to that email (with some reasonable amount of time), the process is eligible to be terminated by the system administrators (though it would only be terminated if other processes need the machines).
- If you know ahead of time that your process will be long, alert linghelp@u.
Strategies for efficient use of resources:
- Test your program on small amounts of data before going whole hog.
- If you have load a lot of data into memory, consider whether there is a more efficient way of doing so.
- Avoid starting intensive processes at the last minute (end of quarter, homework deadlines, etc).
- Each node has 2 CPUs. You will generally get the best performance by launching one process for each CPU. For example, if the cluster has 8 nodes running, you would launch 16 processes. The
nodecheck command will tell you how many nodes are currently running.
- If you are running a program that doesn't migrate (such as a Java program), launching too many simultaneous processes will only slow them all down. Any more than two CPU-intensive processes will simply be stealing CPU time from each other. Additionally, running too many memory-intensive processes may cause the system to swap to disk, greatly slowing down performance.
--
EmilyBender - 09 Jan 2006,
DavidBrodbeck - 10 Jun 2007
to top