Condor configuration Instructions for NorduGrid/ARC Sites

Basic installation

Install Condor on the Grid Manager (GM) node and configure it as a submit machine. Next, you must add the following to the node's Condor configuration (or define CONDOR_IDS as an environment variable):

CONDOR_IDS = 0.0

CONDOR_IDS has to be 0.0, so that Condor will be run as root and can then access the Grid job's session directories (needed to extract various information from the job log).

The next step is to modify arc.conf. In the [common] section, the following should be set:

  1. lrms="condor"
  2. condor_bin_path should be set to the directory, containing Condor binaries (f.ex., /opt/condor/bin). If this parameter is missing, ARC will try to guess it out of the system path, but it is highly recommended to have it explicitly set
  3. condor_config should be set to point to the Condor config file, in case it is located not under default path (/etc/condor/condor_config or ~condor/condor_config). The full path to the file (f.ex., /opt/condor/etc/condor_config) should be given

The next thing to do is to make sure that no normal users are allowed to submit Condor jobs from this node. Otherwise the information system would be fooled into believing that the user jobs came from the Grid. If you don't allow normal user logins on the GM machine, then you don't have to do anything. If you for some reason want to allow users to log into the GM machine, simply don't allow them to execute the condor_submit program. This can be done by putting all local Unix users allocated to the Grid in a single group, e.g. 'griduser', and then setting the file ownership and permissions on condor_submit like this:

chgrp griduser $condor_bin_path/condor_submit
chmod 750 $condor_bin_path/condor_submit

Configuring queues

In heterogeneous clusters it is desirable to configure multiple Nordugrid queues, each with a more or less uniform composition. As of arc 0.6.1, this is now possible to do, even though Condor does not support queues in the classical sense. It is possible, however, to divide the Condor pool in several sub-pools. A 'queue' is then nothing more than a subset of nodes.

Which nodes go into which queue is defined using the 'condor_requirements' option in the corresponding '[queue]' section. Its value must be a well-formed constraint string that is accepted by a condor_status -constraint '...' command. Internally, this constraint string is used to determine the nodes belonging to a queue. This string can get quite long, so, for readability reasons it is allowed to split it up into pieces by using multiple 'condor_requirements' options. The full constrains string will be reconstructed by concatenating all pieces.

Queues should be defined in such a way that their nodes all match the information available in Nordugrid about the queue. A good start is for the 'condor_requirements' option to contain restrictions on the following: Opsys, Arch, Memory and Disk. If you wish to configure more than one queue, it's good to have queues defined in such a way that they do not overlap. In the following example disjoint memory ranges are used to ensure this:

[queue/large]
condor_requirements="(Opsys == "linux" && (Arch == "intel" || Arch == "x86_64")"
condor_requirements=" && (Disk > 30000000 && Memory > 2000)"

[queue/small]
condor_requirements="(Opsys == "linux" && (Arch == "intel" || Arch == "x86_64")"
condor_requirements=" && (Disk > 30000000 && Memory <= 2000 && Memory > 1000)"

Note that 'nodememory' option in arc.conf means the maximum memory available for jobs, while the Memory attribute in Condor is the physical memory of the machine. To avoid swapping (and these are probably not dedicated machines!), make sure that 'nodememory' is smaller than the minimum physical memory of the machines in that queue. If for example the smallest node in a queue has 1Gb memory, then it would be sensible to use nodememory="850" for the maximum job size.

In case you want more precise control over which nodes are available for grid jobs, using pre-defined ClassAds attributes (like in the example above) might not be sufficient. Fortunately, it's possible to mark nodes by using some custom attribute, say NORDUGRID_RESOURCE. This is accomplished by adding a parameter to the node's local Condor configuration file, and then adding that parameter to STARTD_EXPRS:

NORDUGRID_RESOURCE = True
STARTD_EXPRS = NORDUGRID_RESOURCE, $(STARTD_EXPRS)

Now queues can be restricted to contain only 'good' nodes. Just add to each [queue] section in arc.conf:

condor_requirements=" && NORDUGRID_RESOURCE"

Clusters without a shared filesystem

In case GM's session directory is not exported to worker nodes, you need to set the shared_filesystem option in the [grid-manager] section.

shared_filesystem="no"

This option invokes Condor's built-in file transfer mechanism for copying input/output files of jobs to/from worker nodes.

Exclusive job execution

Some jobs require to get a whole node exclusively for their execution. In ARC, this can be achieved by using exclusiveexecution="yes" option in the job description, or by using a special runtime environment script. ARC takes care of passing this requirement to Condor. However, for Condor to support job exclusive execution, some local configuration is needed. This configuration, implemented on Condor working nodes, will do the work.

Customizing Condor's ranking algorithm

If you are not happy with the way Condor picks out nodes when running jobs, you can define your own ranking algorithm by setting the condor_rank option in arc.conf's [common] section. condor_rank should be set to a ClassAd float expression that you could use in the Rank attribute in a Condor job description. For example:

condor_rank="(1-LoadAvg/2)*(1-LoadAvg/2)*Memory/1000*KFlops/1000000"

See the following sections in the Condor manual for more information:

9 condor_submit
Information on the Rank attribute which condor_rank translates directly into.
2.5.2.1 ClassAd Machine Attributes
The full list of all ClassAd attributes.
4.1 Condor's ClassAd Mechanism
The syntax of ClassAd expresions.