
| Line: 1 to 1 | ||||||||
|---|---|---|---|---|---|---|---|---|
Running Condor jobs with large memory requirementsBy default, Condor assigns each process you launch 1 GB of RAM. If your job grows too large, one of two things will happen. | ||||||||
| Changed: | ||||||||
| < < |
| |||||||
| > > |
| |||||||
| ||||||||
| Added: | ||||||||
| > > | Both of these problems can be avoided by giving Condor a realistic idea of how much memory your job needs. | |||||||
Running jobs larger than 1 GBIf you have a job with processes that consume more than 1 GB of memory, you can tell Condor how much RAM they require by adding therequire_memory keyword to your submit file. This value should be specified in megabytes.
Here's an example submit script for an executable called hugejob, which requires at least 7 GB of memory to run: | ||||||||
| Changed: | ||||||||
| < < | universe = vanilla executable = hugejob | |||||||
| > > | executable = hugejob | |||||||
| getenv = true input = hugejob.in output = hugejob.out | ||||||||
| Line: 1 to 1 | ||||||||
|---|---|---|---|---|---|---|---|---|
Running Condor jobs with large memory requirements | ||||||||
| Changed: | ||||||||
| < < | Normally Condor assigns one job to each CPU on a node, dividing up the memory equally. On most of our current systems this results in 2 GB of RAM per slot. Ideally, you should structure your jobs to stay within this amount of memory; this uses the cluster efficiently.
If you job grows too large, one of two things will happen.
| |||||||
| > > | By default, Condor assigns each process you launch 1 GB of RAM. If your job grows too large, one of two things will happen.
| |||||||
| ||||||||
| Changed: | ||||||||
| < < | Running jobs larger than 2 GBIf you have jobs that consume more than 2 GB of memory, you can tell Condor to claim an entire machine instead of one slot, so all of the system's memory is available to your job. To do this, add+RequiresWholeMachine = True | |||||||
| > > | Running jobs larger than 1 GB | |||||||
| Changed: | ||||||||
| < < | to your submit file. (Note the plus sign, which is required. Also, note that this attribute is a custom one for our site and may not be available on other Condor clusters.) You also will also want to tell Condor not to check your job's memory use, so it won't be evicted when it grows larger than 2 GB. This is easily done by adding your own memory constraint to your job's submit file; for example:
Requirements = (Memory > 0)Finally, you may want to specify a minimum amount of total memory for the machine. This can be done by adding a TotalMemory requirement. (Both TotalMemory and Memory are measured in megabytes. Memory is the memory available per slot, while TotalMemory is the total amount of memory for the whole machine.) | |||||||
| > > | If you have a job with processes that consume more than 1 GB of memory, you can tell Condor how much RAM they require by adding the require_memory keyword to your submit file. This value should be specified in megabytes. | |||||||
Here's an example submit script for an executable called hugejob, which requires at least 7 GB of memory to run:
universe = vanilla | ||||||||
| Line: 25 to 17 | ||||||||
| output = hugejob.out error = hugejob.err log = hugejob.log | ||||||||
| Deleted: | ||||||||
| < < | +RequiresWholeMachine = True
Requirements = ( Memory > 0 && TotalMemory >= (7*1024) )
Note: Be careful about being too specific with TotalMemory constraints. For various reasons (memory consumed by the OS, etc.) the TotalMemory constraint will probably be stricter than you expect. For example, our 4 gigabyte nodes actually report their total memory as 3950 MB, so a constraint of (TotalMemory >= (4*1024)) will exclude them.
Interaction with other jobsJobs with +RequiresWholeMachine set follow the following rules:
| |||||||
| \ No newline at end of file | ||||||||
| Added: | ||||||||
| > > | require_memory = 7*1024 queue | |||||||
| Line: 1 to 1 | ||||||||
|---|---|---|---|---|---|---|---|---|
Running Condor jobs with large memory requirements | ||||||||
| Changed: | ||||||||
| < < | Normally Condor assigns one job to each CPU on a node, dividing up the memory equally. On all of our current systems this results in 2 GB of RAM per slot. Ideally, you should structure your jobs to stay within this amount of memory; this uses the cluster efficiently. | |||||||
| > > | Normally Condor assigns one job to each CPU on a node, dividing up the memory equally. On most of our current systems this results in 2 GB of RAM per slot. Ideally, you should structure your jobs to stay within this amount of memory; this uses the cluster efficiently. | |||||||
If you job grows too large, one of two things will happen.
| ||||||||
| Line: 10 to 10 | ||||||||
Running jobs larger than 2 GBIf you have jobs that consume more than 2 GB of memory, you can tell Condor to claim an entire machine instead of one slot, so all of the system's memory is available to your job. To do this, add | ||||||||
| Changed: | ||||||||
| < < | +RequiresWholeMachine = Trueto your submit file. (Note the plus sign, which is required. Also, note that this attribute is a custom one for our site and may not be available on other Condor clusters.) You also will also want to tell Condor not to check your job's memory use, so it won't be evicted when it grows larger than 2 GB. This is easily done by adding your own memory constraint to your job's submit file; for example: Requirements = (Memory > 0)Finally, you may want to specify a minimum amount of total memory for the machine. This can be done by adding a TotalMemory requirement. (Both TotalMemory and Memory are measured in megabytes. Memory is the memory available per slot, while TotalMemory is the total amount of memory for the whole machine.) | |||||||
| > > | +RequiresWholeMachine = Trueto your submit file. (Note the plus sign, which is required. Also, note that this attribute is a custom one for our site and may not be available on other Condor clusters.) You also will also want to tell Condor not to check your job's memory use, so it won't be evicted when it grows larger than 2 GB. This is easily done by adding your own memory constraint to your job's submit file; for example: Requirements = (Memory > 0)Finally, you may want to specify a minimum amount of total memory for the machine. This can be done by adding a TotalMemory requirement. (Both TotalMemory and Memory are measured in megabytes. Memory is the memory available per slot, while TotalMemory is the total amount of memory for the whole machine.) | |||||||
Here's an example submit script for an executable called hugejob, which requires at least 7 GB of memory to run:
universe = vanilla | ||||||||
| Line: 21 to 26 | ||||||||
| error = hugejob.err log = hugejob.log +RequiresWholeMachine = True | ||||||||
| Changed: | ||||||||
| < < | Requirements = ( Memory > 0 && TotalMemory >= (7*1024) ) Note: Be careful about being too specific with TotalMemory constraints. For various reasons (memory consumed by the OS, etc.) the TotalMemory constraint will probably be stricter than you expect. For example, our 4 gigabyte nodes actually report their total memory as 3950 MB, so a constraint of (TotalMemory >= (4*1024)) will exclude them. | |||||||
| > > | Requirements = ( Memory > 0 && TotalMemory >= (7*1024) ) Note: Be careful about being too specific with TotalMemory constraints. For various reasons (memory consumed by the OS, etc.) the TotalMemory constraint will probably be stricter than you expect. For example, our 4 gigabyte nodes actually report their total memory as 3950 MB, so a constraint of (TotalMemory >= (4*1024)) will exclude them. | |||||||
Interaction with other jobs | ||||||||
| Line: 1 to 1 | ||||||||
|---|---|---|---|---|---|---|---|---|
Running Condor jobs with large memory requirements | ||||||||
| Line: 21 to 21 | ||||||||
| error = hugejob.err log = hugejob.log +RequiresWholeMachine = True | ||||||||
| Changed: | ||||||||
| < < | Requirements = ( Memory > 0 && TotalMemory >= (7*1024) ) Note: Be careful about being too specific with TotalMemory constraints. For various reasons (memory consumed by the OS, etc.) the TotalMemory constraint will probably be stricter than you expect. For example, our 4 gigabyte nodes actually report their total memory as 3950 KB, so a constraint of (TotalMemory >= (4*1024)) will exclude them. | |||||||
| > > | Requirements = ( Memory > 0 && TotalMemory >= (7*1024) ) Note: Be careful about being too specific with TotalMemory constraints. For various reasons (memory consumed by the OS, etc.) the TotalMemory constraint will probably be stricter than you expect. For example, our 4 gigabyte nodes actually report their total memory as 3950 MB, so a constraint of (TotalMemory >= (4*1024)) will exclude them. | |||||||
Interaction with other jobs | ||||||||
| Line: 1 to 1 | ||||||||
|---|---|---|---|---|---|---|---|---|
Running Condor jobs with large memory requirements | ||||||||
| Line: 10 to 10 | ||||||||
Running jobs larger than 2 GBIf you have jobs that consume more than 2 GB of memory, you can tell Condor to claim an entire machine instead of one slot, so all of the system's memory is available to your job. To do this, add | ||||||||
| Changed: | ||||||||
| < < | +RequiresWholeMachine = Trueto your submit file. (Note the plus sign, which is required. Also, note that this attribute is a custom one for our site and may not be available on other Condor clusters.) You also will also want to tell Condor not to check your job's memory use, so it won't be evicted when it grows larger than 2 GB. This is easily done by adding your own memory constraint to your job's submit file; for example: Requirements = (Memory > 0)Finally, you may want to specify a minimum amount of total memory for the machine. This can be done by adding a TotalMemory requirement. (Both TotalMemory and Memory are measured in kilobytes. Memory is the memory available per slot, while TotalMemory is the total amount of memory for the whole machine.) | |||||||
| > > | +RequiresWholeMachine = Trueto your submit file. (Note the plus sign, which is required. Also, note that this attribute is a custom one for our site and may not be available on other Condor clusters.) You also will also want to tell Condor not to check your job's memory use, so it won't be evicted when it grows larger than 2 GB. This is easily done by adding your own memory constraint to your job's submit file; for example: Requirements = (Memory > 0)Finally, you may want to specify a minimum amount of total memory for the machine. This can be done by adding a TotalMemory requirement. (Both TotalMemory and Memory are measured in megabytes. Memory is the memory available per slot, while TotalMemory is the total amount of memory for the whole machine.) | |||||||
Here's an example submit script for an executable called hugejob, which requires at least 7 GB of memory to run:
universe = vanilla | ||||||||
| Line: 32 to 32 | ||||||||
| I'm still tweaking these rules, so if you see any pathological behavior, or have an idea for a way to allocate slots more fairly, email linghelp@u and let me know. | ||||||||
| Changed: | ||||||||
| < < | -- brodbd - 30 Mar 2009 | |||||||
| > > | -- brodbd - 09 Apr 2009 | |||||||
| Line: 1 to 1 | ||||||||
|---|---|---|---|---|---|---|---|---|
Running Condor jobs with large memory requirements | ||||||||
| Line: 10 to 10 | ||||||||
Running jobs larger than 2 GBIf you have jobs that consume more than 2 GB of memory, you can tell Condor to claim an entire machine instead of one slot, so all of the system's memory is available to your job. To do this, add | ||||||||
| Changed: | ||||||||
| < < | +RequiresWholeMachine = Trueto your submit file. (Note the plus sign, which is required. Also, note that this attribute is a custom one for our site and may not be available on other Condor clusters.) You also will also want to tell Condor not to check your job's memory use, so it won't be evicted when it grows larger than 2 GB. This is easily done by adding your own memory constraint to your job's submit file; for example: Requirements = (Memory > 0)Finally, you may want to specify a minimum amount of total memory for the machine. This can be done by adding a TotalMemory requirement. (Both TotalMemory and Memory are measured in kilobytes. Memory is the memory available per slot, while TotalMemory is the total amount of memory for the whole machine.) | |||||||
| > > | +RequiresWholeMachine = Trueto your submit file. (Note the plus sign, which is required. Also, note that this attribute is a custom one for our site and may not be available on other Condor clusters.) You also will also want to tell Condor not to check your job's memory use, so it won't be evicted when it grows larger than 2 GB. This is easily done by adding your own memory constraint to your job's submit file; for example: Requirements = (Memory > 0)Finally, you may want to specify a minimum amount of total memory for the machine. This can be done by adding a TotalMemory requirement. (Both TotalMemory and Memory are measured in kilobytes. Memory is the memory available per slot, while TotalMemory is the total amount of memory for the whole machine.) | |||||||
| Changed: | ||||||||
| < < | Here's an example submit script for an executable called hugejob, which requires at least 8 GB of memory to run: | |||||||
| > > | Here's an example submit script for an executable called hugejob, which requires at least 7 GB of memory to run: | |||||||
universe = vanilla executable = hugejob getenv = true | ||||||||
| Line: 24 to 21 | ||||||||
| error = hugejob.err log = hugejob.log +RequiresWholeMachine = True | ||||||||
| Changed: | ||||||||
| < < | Requirements = ( Memory > 0 && TotalMemory >= (8*1024) ) | |||||||
| > > | Requirements = ( Memory > 0 && TotalMemory >= (7*1024) ) Note: Be careful about being too specific with TotalMemory constraints. For various reasons (memory consumed by the OS, etc.) the TotalMemory constraint will probably be stricter than you expect. For example, our 4 gigabyte nodes actually report their total memory as 3950 KB, so a constraint of (TotalMemory >= (4*1024)) will exclude them. | |||||||
Interaction with other jobsJobs with +RequiresWholeMachine set follow the following rules: | ||||||||
| Line: 35 to 32 | ||||||||
| I'm still tweaking these rules, so if you see any pathological behavior, or have an idea for a way to allocate slots more fairly, email linghelp@u and let me know. | ||||||||
| Changed: | ||||||||
| < < | -- brodbd - 26 Mar 2009 | |||||||
| > > | -- brodbd - 30 Mar 2009 | |||||||
| Line: 1 to 1 | ||||||||
|---|---|---|---|---|---|---|---|---|
Running Condor jobs with large memory requirements | ||||||||
| Changed: | ||||||||
| < < | Normally Condor assigns one job to each CPU on a node, dividing up the memory equally. On all of our current systems this results in 2 GB of RAM per slot. Ideally, you should structure your jobs to stay within this amount of memory. | |||||||
| > > | Normally Condor assigns one job to each CPU on a node, dividing up the memory equally. On all of our current systems this results in 2 GB of RAM per slot. Ideally, you should structure your jobs to stay within this amount of memory; this uses the cluster efficiently. | |||||||
If you job grows too large, one of two things will happen.
| ||||||||
| Changed: | ||||||||
| < < |
| |||||||
| > > |
Running jobs larger than 2 GB | |||||||
| Changed: | ||||||||
| < < | Eventually I will implement a custom submit file attribute to allow jobs to claim the entire machine, but this requires a newer version of Condor than we're currently running. My current target for this upgrade is the spring '09 term break. However, there are some stop-gap techniques that can help. By adding the requirement "VirtualMachineID == 1" to your job, it will only run on the first CPU slot of any machine. This will not prevent other jobs from occupying other slots, but it will ensure that only one copy of your job (or any similarly flagged job) will run on each machine. Note: The name of this parameter changed to SlotID in condor 7.x, so when we upgrade in the spring any submit files that use this parameter will need to be changed. | |||||||
| > > | If you have jobs that consume more than 2 GB of memory, you can tell Condor to claim an entire machine instead of one slot, so all of the system's memory is available to your job. To do this, add
+RequiresWholeMachine = Trueto your submit file. (Note the plus sign, which is required. Also, note that this attribute is a custom one for our site and may not be available on other Condor clusters.) You also will also want to tell Condor not to check your job's memory use, so it won't be evicted when it grows larger than 2 GB. This is easily done by adding your own memory constraint to your job's submit file; for example: Requirements = (Memory > 0)Finally, you may want to specify a minimum amount of total memory for the machine. This can be done by adding a TotalMemory requirement. (Both TotalMemory and Memory are measured in kilobytes. Memory is the memory available per slot, while TotalMemory is the total amount of memory for the whole machine.) Here's an example submit script for an executable called hugejob, which requires at least 8 GB of memory to run: universe = vanilla executable = hugejob getenv = true input = hugejob.in output = hugejob.out error = hugejob.err log = hugejob.log +RequiresWholeMachine = True Requirements = ( Memory > 0 && TotalMemory >= (8*1024) ) Interaction with other jobs | |||||||
| Changed: | ||||||||
| < < | By adding an explicit Memory requirement to your job, Condor will allow it to run on any slot with at least that amount of RAM, and will not evict it if it grows larger than 2 GB. (It's still vulnerable to the out-of-memory killer if it grows too large, however.) This requirement is measured in kilobytes and, for our purposes, can be set to any arbitrary number that's less than the smallest slot in the cluster -- currently 1975 KB. | |||||||
| > > | Jobs with +RequiresWholeMachine set follow the following rules:
| |||||||
| Changed: | ||||||||
| < < | Combining these two requirements, we end up with the following, which can be added to the submit file of your large job:
Requirements = (VirtualMachineID == 1 && Memory > 1024) | |||||||
| > > | I'm still tweaking these rules, so if you see any pathological behavior, or have an idea for a way to allocate slots more fairly, email linghelp@u and let me know. | |||||||
| Deleted: | ||||||||
| < < | -- brodbd - 23 Feb 2009 | |||||||
| \ No newline at end of file | ||||||||
| Added: | ||||||||
| > > | -- brodbd - 26 Mar 2009 | |||||||
| Line: 1 to 1 | ||||||||
|---|---|---|---|---|---|---|---|---|
Running Condor jobs with large memory requirements | ||||||||
| Line: 12 to 12 | ||||||||
| By adding the requirement "VirtualMachineID == 1" to your job, it will only run on the first CPU slot of any machine. This will not prevent other jobs from occupying other slots, but it will ensure that only one copy of your job (or any similarly flagged job) will run on each machine. Note: The name of this parameter changed to SlotID in condor 7.x, so when we upgrade in the spring any submit files that use this parameter will need to be changed. | ||||||||
| Changed: | ||||||||
| < < | By adding an explicit Memory requirement to your job, Condor will allow it to run on any slot with at least that amount of RAM, and will not evict it if it grows larger than 2 GB. (It's still vulnerable to the out-of-memory killer if it grows too large, however.) This requirement is measured in kilobytes and, for our purposes, can be set to any arbitrary number that's less than the smallest slot in the cluster -- currently 1976 KB. | |||||||
| > > | By adding an explicit Memory requirement to your job, Condor will allow it to run on any slot with at least that amount of RAM, and will not evict it if it grows larger than 2 GB. (It's still vulnerable to the out-of-memory killer if it grows too large, however.) This requirement is measured in kilobytes and, for our purposes, can be set to any arbitrary number that's less than the smallest slot in the cluster -- currently 1975 KB. | |||||||
Combining these two requirements, we end up with the following, which can be added to the submit file of your large job:
Requirements = (VirtualMachineID == 1 && Memory > 1024) | ||||||||
| Line: 1 to 1 | ||||||||
|---|---|---|---|---|---|---|---|---|
| Added: | ||||||||
| > > |
Running Condor jobs with large memory requirementsNormally Condor assigns one job to each CPU on a node, dividing up the memory equally. On all of our current systems this results in 2 GB of RAM per slot. Ideally, you should structure your jobs to stay within this amount of memory. If you job grows too large, one of two things will happen.
Requirements = (VirtualMachineID == 1 && Memory > 1024)-- brodbd - 23 Feb 2009 | |||||||