NORDUGRID-MANUAL-2

ARC 11.05 and above server installation instructions:
quick start for setting up a grid resource

This is a shortened installation manual; for details, please consult the ARC CE System Administrator Guide.

General notes:

 :: Preparation ::  Grid software ::  Security ::  Configuration ::  Start-up :: 

Pre-installation steps:

General requirements for equipment

Hardware, operating system etc

The NorduGrid's ARC middleware does not impose heavy requirements on hardware. Any 32-bit architecture will do, as well as many 64-bit ones. Some success has been reported for PPC, too. CPU frequency from 333 MHz and up has been tested, and at least 512MB of RAM is recommended. Disk space required for the ARC installation including development interface is about 160 MB, while external software (most notably, minimal setup of Globus Toolkit 5) requires another 10 MB. Network connectivity of servers (front-ends, gatekeepers, database servers, storage arrays etc) is required to be both out- and inbound. In case you are behind a firewall, a range of ports will have to be completely opened. For clusters/HPC resources, the worker nodes can either be on a private or a public network.

A shared file system, such as e.g. NFS, is desirable (due to simplicity) but not required, if the Local Resource Management System provides means for file staging between the computing nodes and the frontend, or if execution happens on the same machine (known as job forking). Local authentication of Grid users is supported through embedded authentication algorithms and callouts to external executables or functions in dynamically loadable libraries. Actual implementation (e.g., for AFS) may require site-specific modules.

The ARC middleware is expected to run on any system supported by Globus. At the moment, only GNU/Linux of the following distributions are supported for server side: Fedora, Red Hat Enterpise Linux, Debian, Ubuntu and (partially) OpenSuSE.

DNS Requirements for GSI (Grid Security Infrastructure)

It is important to bear in mind that Grid requires presence of valid certificates on all servers, see security notes for more info.

In order for the authentication of a server's host certificate to be successful, the reverse DNS lookup of the IP address of the server must result in the hostname given in the host certificate.

This means that the reverse DNS lookup for a host running a GSI enabled service must be configured properly - a "host not found" result is not acceptable. When a server has several hostnames/aliases, the host certificate should be requested with the hostname that is used in the reverse lookup table in the DNS.

This reverse lookup must work for all clients trying to connect to the server, including utilities and tools running on the machine itself. Even if the host is a dedicated server and no user interface commands are being executed from it, other utilities require GSI authentication.

Since the hostname in the host certificate is fully qualified, the reverse lookup must yield the fully qualified hostname. If the /etc/hosts file is used for local lookups instead of DNS, make sure that the fully qualified hostname is listed before any shortnames or aliases for the server host.

If e.g. the /etc/hosts file of the server looks like this:

1.2.3.4    somename    somename.domain.com

any tool running on that machine can NOT contact servers on the machine itself since the result of a reverse lookup will be the unqualified hostname "somename", which will not match the fully qualified hostname in the host certificate. Such a /etc/hosts file should be modified to read

1.2.3.4    somename.domain.com    somename

Time synchronization

Since authorization on the Grid relies on temporary delegated credentials (proxies), it is very important to synchronize the clock on your machines with a reliable time server. If the clock on a cluster is off by 3 hours, the cluster will either reject a newly created user proxy for the first 3 hours of its lifetime and then accept the proxy for 3 hours longer than it is supposed to, or start rejecting the proxy three hours too early, depending on in which direction the clock is off. The NTP protocol can be used to keep your clusters "on time".

Firewall and Security

Make your firewall Grid-friendly: there are certain incoming and outgoing ports and port ranges which need to be opened in case your Grid resource is behind a firewall. Globus-based Grid services, including those currently implemented in ARC, are not supported to work behind NAT firewalls.

ARC needs the following incoming and outgoing ports to be opened:

Most ports, including 2135 and 2811, are registered with IANA and should normally not be changed. The ports for GridFTP data channels can be chosen arbitrary, based on following considerations: gridftpd by default handles 100 connections simultaneously; each connection should not use more than 1 additional TCP port. Taking into account that Linux tends to keep ports allocated even after the handle is closed for some time, it is a good idea to triple that amount. Hence about 300 data transfer ports should be enough for the default configuration. Typically, the range of ports from 9000 to 9300 is being opened. Remember to specify this range in the ARC configuration file ([common] section, globus_tcp_port_range attribute) later on.
Default port 443 for web services might be restrictive, so it is possible to change that as well. Common choices are 50000 or 60000. This is done by adding the line arex_mount_point="https://your.host:60000/arex" to the [common] section in the the ARC configuration file

Note on SELinux:   Startup scripts should be able to setup SELinux for you. If you cannot access one of the services, please try setting SELinux in permissive mode.
Note on AppArmor (on Ubuntu or Debian):   Some AppArmor setups prevent ldap Infosystem to bind ports. If you experience any problem while starting the information system, the easiest solution for apparmor is to remove /etc/apparmor.d/usr.sbin.slapd and restart AppArmor. If such a file doesn't exist disable AppArmor or put all the profiles in complain mode (this can be of help: https://help.ubuntu.com/community/AppArmor).
Explanations on profiling for both security systems will be included in the documentation in the future.

Computing Element

  1. First, you have to create (some) UNIX accounts on your Grid machine. These local UNIX accounts will be used to map Grid users locally, and every Grid job or Grid activity will take place via these accounts. In the simplest scenario, it is enough to create a single account, e.g. a user called grid, but you should have separate accounts for different Grid user groups (Virtual Organisations). In addition to authorization rules provided by middleware, you may group the created Grid accounts into UNIX groups and use the local UNIX authorization methods to restrict the Grid accounts.
  2. Create disk areas on the front-end which will be used by the Grid services. A typical setup is given in the table below, with example locations indicated. NFS means that the directory has to be available on the nodes, typically via a shared file system. It is recommended to put the session directory and the cache directory onto separate volumes (partitions, disks).
    Function Location Description Example Suggested permissions
    session directory (required) NFS the directory which accomodates the session directories of the Grid jobs /scratch/grid 755, owner grid, group grid
    cache directory (optional) NFS/local the place where the input files of the Grid jobs are cached /scratch/cache 755, owner grid, group grid
    runtime environment scripts (optional) NFS or replicated
    on each LRMS' computing node
    the place for the initialization scripts of the pre-installed software environments /SOFTWARE/runtime 755, owner grid, group grid
    control directory (required) local to the front-end the directory for the internal control files of the grid-manager process /var/spool/arc/jobstatus 755, owner and group has
    to be root or the user specified in
    arc.conf
    ...
    [grid-manager]
    user=""
    ...

    Further notes on the Grid directories:

    • some of the NFS requirements can be relaxed with a special cluster setup and configuration. For the possible special setups and specific LRMS options please consult the ARC CE System Administrator Guide.
    • The cache directory does not necessarily have to be available on computing nodes. If it is not available, A-REX needs to be configured to copy cached files to the session directory.
    • Instead of sharing the directory of the runtime environment scripts, the scripts can be installed on every node.
    • Use a dedicated directory for the runtime environment scripts, avoid re-using directories that have other files.
  3. Check the network connectivity of the computing nodes. For the ARC middleware, internal cluster nodes are NOT required to be fully available on the public internet (however, user applications may require it). Nodes can have inbound, outbound, both or no network connectivity. This nodeaccess property should be set in the configuration (the configuration templates already contain those, see below).
  4. Configure the Local Resource Management/Batch System in order to suit the Grid. In a typical scenario, a queue (or queues) dedicated to Grid jobs have to be created, all or some of the cluster nodes have to be assigned to the Grid queues, and queue and user limits have to be set for the Grid queue and the Grid accounts. Brief instructions are available for PBS and Condor setups procedures. DO NOT use PBS routing queues as grid queues – they are not supported. However a special configuration with a single routing queue is supported, please consult the ARC CE System Administrator Guide.

Storage Element:

ARC comes with a very basic Storage Element: a GridFTP server, suitable for simple small-scale storage. Preparatory steps are rather trivial:

  1. Install a standard Linux box with a dedicated disk storage area. In case the SE wants to serve several Grid user groups (Virtual Organizations), it is preferable to dedicate separate disks (volumes, partitions, etc.) for different groups.
  2. Create Grid accounts: one does not have to create UNIX accounts dedicated to Grid. But one may find it useful to do that for local accounting. These local UNIX accounts will be used to map Grid users locally, and the data stored on the Storage Element will be owned by these accounts. In the simplest scenario, it is enough to create a single account, e.g. a user called grid, but one can also have separate accounts for the different Grid user groups. One may find it useful to put all the Grid accounts into the same UNIX group.
  3. Make your firewall Grid-friendly: follow the same requirements as above for clusters.

Installing the Grid software (middleware):

The same basic server software is needed both for Compute and Storage Elements.

All the required ARC software, as well as the necessary external packages, are available from the NorduGrid repositories. It is also available from standard Fedora and EPEL repositories, and most of it - from Ubuntu and Debian. NorduGrid distributes both source code and binary packages. Binaries are available as either RPMs, debs or tarballs.

Installation from Linux repositories

For Linux users with system administrator privileges, recommended installation method is via use of repositories. The reason for this is complex dependencies of packages, which are best handled by repositories. Depending on your needs, you will need to enable at least one repository for each of the categories below:

Examples:

The recommended way is to install the nordugrid-arc-compute-element meta-package, i.e., as root user, do: yum install nordugrid-arc-compute-element

Or on Debian-based systems: sudo apt-get install nordugrid-arc-compute-element

When meta-package is not available or not desirable, install the following (on RH-based systems): yum install nordugrid-arc-gridftpd nordugrid-arc-arex nordugrid-arc-aris nordugrid-arc-gridmap-utils nordugrid-arc-plugins-needed nordugrid-arc-plugins-globus nordugrid-arc-plugins-xrootd ca_policy_igtf-classic ca_policy_igtf-mics ca_policy_igtf-slcs fetch-crl

Or on Debian-based systems: sudo apt-get install nordugrid-arc-gridftpd nordugrid-arc-arex nordugrid-arc-aris nordugrid-arc-gridmap-utils nordugrid-arc-plugins-needed nordugrid-arc-plugins-globus ca-policy-igtf-classic ca-policy-igtf-mics ca-policy-igtf-slcs fetch-crl

In case you're going to request host certificate from the NorduGrid CA, install also nordugrid-arc-ca-utils and globus-gsi-cert-utils-progs.

It is also advised to install client tools, documentation, and development libraries (on RH-based systems): yum install nordugrid-arc-client nordugrid-arc-doc nordugrid-arc-devel

Or on Debian-based systems: sudo apt-get install nordugrid-arc-client nordugrid-arc-doc nordugrid-arc-devel

Optional: Re-building ARC middleware

This step is only needed if you are using an unsupported operating system or different versions of external dependencies, and experience problems with ARC.

Detailed build instructions are given in the README file available with the distributed ARC source code.

The procedure below describes RPM build; it is very similar for deb:

  1. Make sure that all the necessary external dependencies mentioned in the README are satisfied. The list in README may not be complete, as some dependencies are distribution-dependent, therefore please always check ./configure output messages to find out what libraries are missing.
  2. Get from the NorduGrid Downloads area source RPMs of the necessary ARC release nordugrid-arc-<x.y.z-1>.src.rpm and rebuild it: rpm --rebuild nordugrid-arc-<x.y.z-1>.src.rpm
  3. Alternatively, you can get a tarball nordugrid-arc-<x.y.z>.tar.gz, and follow the normal procedure: tar xvzf nordugrid-arc-<x.y.z>.tar.gz
    cd nordugrid-arc-<x.y.z>
    ./autogen.sh
    ./configure
    make
    make install

There is a variery of options that can be specified with ./configure (e.g., to disable or enable specific components); use ./configure --help to obtain the complete and up-to-date list of such.

Setting up the Grid Security Infrastructure: Certificates, Authentication and Authorization

Read carefully the following section, as your resource will not be able to function if it has improper or outdated credentials.

The following considerations apply for compute elements, storage elements or any Grid service in general. You may find useful our certificate mini How-to.

Grid CA certificates

Every Grid service or tool requires installation of public certificates of Certificate Authorities (CA). In case you do not have such installed yet, obtain them from either of the following providers:

These certificates are necessary to use international Grid infrastructures. Make sure your national CA certificates are always present. If your project makes use of own internal certificates, install them as well (contact your project support team for details). If there are some CAs that are banned by your local policies, make sure to remove their certificates from your computer.

Before installing any CA package, you are advised to check the credibility of the CA and verify its policy!

The Certificate Authorities are responsible for maintaining lists of revoked personal and service certificates, known as CRL (Certificate Revocation List).

It is the site's (that is, yours) responsibility to check the CRLs regularly and deny access to Grid users presenting a revoked certificate. Outdated CRLs will render your site unuseable.

An automatic tool for regular CRL check-up fetch-crls is available in several Linux distributions and from NorduGrid repositories. When installed in crontab, the utility periodically keeps track of the CA revocation lists.

Obtaining site (host, service) certificates

Your site needs to have certificates for the Grid services issued by your national Certificate Authority (CA). The minimum is a host certificate but we recommend to have a certificate for each service (e.g. LDAP) as well.

Each country has own certification policies and procedures, please consult your local Certificate Authority

In case your resource is in a Nordic country (Denmark, Finland, Norway, Iceland or Sweden), you may request the certificate from the install the NorduGrid CA. In some countries, Terena e-Science Service Certificate is preferred.

In order to create a request to the NorduGrid CA, install certrequest-config package from the NorduGrid Downloads area. This contains the default configuration for generating certificate requests for Nordic-based services and users. If you are located elsewhere, contact your local CA for details. For example, in Nordic countries, generate a host certificate request with

grid-cert-request -host <my.host.fqdn>

and a LDAP certificate request (needed for either compute or storage element) with

grid-cert-request -service ldap -host <my.host.fqdn>

and send the request(s) to the NorduGrid CA for signing.
Upon receipt of the signed certificates, place them into the proper location (by default, /etc/grid-security/).

Check that the certificate and key files are owned by root and the private keys are only readable by root and that none of the files has executable permissions. Also make sure private keys are not password-protected. This is especially important if you used a tool other than grid-cert-request or ran it in interactive mode.

Configuring access

  1. Define up your authentication policy: decide which certificates your site will accept. In practice, this is done by installing/removing specific CA credentials, as described above.
  2. Define up your authorization policy: decide which Grid users or groups of Grid users (Virtual Organizations) are allowed to use your resource, and define the Grid mappings (Grid users to local Unix users). The Grid mappings are listed in the so-called grid-mapfile. Within ARC, there is an automatic tool which can keep the local grid-mapfiles synchronized with central databases of Grid users and VOs. It is available in the nordugrid-arc-gridmap-utils package from the repositories. Follow the configuration instructions to configure your system properly authorization-wise: it involves editing [vo] blocks in the configuration file. For further info on authorization read the NorduGrid VO documentation. IMPORTANT: you either maintain the grid mappings by hand editing the /etc/grid-security/grid-mapfile directly, or use the nordugrid-arc-gridmap-utils (nordugridmap script ran through cron) to create and maintain the mappings file for your site. In the latter case, the utility keeps the grid-mapfile synchronized with the central authorization service of your choice, for instance NorduGrid user list. If you install the nordugrid-arc-gridmap-utils you ONLY have to edit the [vo] blocks in the configuration file and optionally the file representing local list of mappings (usually /etc/grid-security/local-grid-mapfile). ADVANCED: You may use more flexible methods of authorizing and mapping Grid users to UNIX accounts including dynamic allocation and third-party algorithms. For more information please refer to the ARC CE System Administrator Guide. You still need to maintain /etc/grid-security/grid-mapfile with at least a superset of authorized users, as the information system still relies on it.

Configuring the Grid resource

Next step is the configuration of your resource. ARC uses a single configuration file per host machine, independently of the number and nature of services it hosts.
The default location of this file is /etc/arc.conf. A different location can be specified by the environment variable ARC_CONFIG.

ARC provides several out-of-the-box configuration templates, that can be just copied over /etc/arc.conf, and they will provide the system with all the basic information to run.
These templates are installed with the nordugrid-arc-doc package (not automatically included with the meta-package, so will need manual installation if not already done) and are usually located in
/usr/share/doc/nordugrid-arc-doc/examples/ if installing from package,
$ARC_LOCATION/share/doc/examples/ if installing from source.
The templates for different ARC services can also be found on the Web.

A complete list of configuration options is available in the file $ARC_LOCATION/examples/arc.conf.reference that comes with the distributed software (usually in /usr/share/arc/examples/arc.conf.reference). However, this file must NOT be used as a template file for starting the services, as it just includes all the possible options that may result in unexpected server behaviour.

The configuration file consists of dedicated blocks for different services. If your host node runs only some of the services, unrelated blocks should be removed.

Not having a service block means not running the corresponding service on the resource.

For more details, see the configuration and authorisation section of the ARC CE System Administrator manual.

  1. Create your /etc/arc.conf by using one of the out-of-the-box configuration templates mentioned above. With the arc.conf you configure all services and processes:
    • GridFTP server
    • A-REX
    • Information system
    • Information providers
    • Authorization
    • Grid storage areas

    Make sure you configure your services to use the ports that are opened in the firewall. In particular, define globus_tcp_port_range="9000,9300" in the [common] section of arc.conf, or whatever range is opened in the firewall for gridftp data connections. The ports 2135 (LDAP) and 2811 (GridFTP) can be changed with the port="<port number>" option in the [infosys] and the [gridftpd] section of the arc.conf, respectively.

    For authorization using Virtual Organizations, make sure you have one or more [vo] sections in arc.conf. These blocks should be configured to create user mappings in /etc/grid-security/grid-mapfile (the latter file name is configurable in arc.conf). Follow the configuration template and consult NorduGrid VO lists for detailed information.

  2. If your site is going to provide resources via the NorduGrid production grid, you will need to check the latest NorduGrid GIIS Information for the list of country-level and core NorduGrid Grid Information Index Services to which your host will have to register.
  3. Optionally, you can setup Runtime Environments on your computing cluster. Setting up a Runtime Environment means installing a specific application software package onto the cluster in a centralized and shared manner (the software package is made available for the worker nodes as well!), and placing a Runtime Environment initialization script (named after the Runtime Environment) into the dedicated directory. You may want to consult a Runtime Environment Registry for a list of official Runtime environments.
  4. If you configured your storage element to be GACL-enabled, consult the GACL Howto for explanations and examples of .gacl files.

Startup scripts, services, logfiles, debug mode, test-suite:

  1. After a successfull installation and configuration of an ARC resource the following services must be started:
    • The GridFTP server (gridftpd daemon): /etc/init.d/gridftpd start
    • The A-REX service (arched daemon): /etc/init.d/a-rex start
    • The LDAP Information System: /etc/init.d/nordugrid-arc-ldap-infosys start
    • The registration processes: /etc/init.d/nordugrid-arc-inforeg start

    All services can be run under a non-root account (configurable in the arc.conf). While for a Storage Element that only affects ownership of stored data, for a Compute Element the impact is more significant and some functionality is lost, so setting user=root in the [grid-manager] section is RECOMMENDED. Make sure that the host and service certificates are owned by the corresponding users (those in whose name the services are started).

    LDAP Information System services are RECOMMENDED to run with the distribution's default "ldap" user (username may vary). This is the default if no user={username} option is specified in the [infosys] section.

  2. The log files can be used to check the services (log file locations are configurable in arc.conf):
    • the Information System uses /var/log/arc/infoprovider.log for information collection; /var/log/arc/inforegistration.log for registration processes; /var/log/arc/bdii/bdii-update.log for slapd database updates and relevant slapd errors.
    • gridftpd writes logs into the /var/log/arc/gridftpd.log by default, the debug level can be set in the arc.conf
    • A-REX (grid-manager process) uses /var/log/arc/grid-manager.log for general information and /var/log/arc/gm-jobs.log for logging job information, the debug level is set in the arc.conf.
    Log rotation can performed by services themselves and configured in arc.conf too, however logrotate configurations are installed with packages. If logrotate or any other external log file handler is used, ARC's own log rotation must be disabled. See ARC CE System Administrator Guide for more information. The startup scripts log failure to the syslog. Once the server is up and running you should consult the corresponding server's log file. If a service fails even to start up, the syslog file of your system should be checked. This is normally /var/log/messages or /var/log/syslog.
  3. Debug information: in the arc.conf different debug levels can be set for all services. Please note that enabling debugging results may cause serious performance losses (especially in the case of the LDAP server used in the information system), therefore use the default level of debugging in a production system.
  4. To quickly test if the information system is working properly, one of these ldap queries can be used:
    to test NorduGrid schema publishing:ldapsearch -x -H ldap://piff.hep.lu.se:2135 -b 'mds-vo-name=local,o=grid' to test Glue 1.2/1.3 schema publishing:ldapsearch -x -H ldap://piff.hep.lu.se:2135 -b 'mds-vo-name=resource,o=grid' to test GLUE2 schema publishing: ldapsearch -x -H ldap://piff.hep.lu.se:2135 -b 'o=glue'

    If the Information System is running correctly, ldap entries for the cluster should show on the screen. If this doesn't work, it is useless to proceed to step 5, and a troubleshoot of the information system is needed. Common problems are firewall and security, so please read carefully that section before going to next step.

  5. The ARC client comes with the arctest utility. Use it to test the basic functionality of the computing resource. The utility includes several tests which can be interesting to test your cluster with e.g. simple up- and download tests. Check man arctest for a brief explanation of the possible tests. Prior to submitting test jobs, make sure you possess a valid user certificate, have generated a valid Grid proxy and have credentials of all the necessary CAs installed. For a quick installation validation, run the test number 1 against your resource: arctest -c <my.host.fqdn> -J 1 This will execute a Grid job, including staging of files to the computing resource (downloading input files from several locations and caching), and running test calculation on the resource. We recommend to run at least this test against a newly installed resource and to fetch the job output by using: arcget <jobid> See arctest man-page for more details on the test suite. The arcls client (comes with the nordugrid-arc-client package) can be used for testing the Storage Element and Computing Element interface setup: arcls -l gsiftp://<my.host.fqdn> This instruction opens a GridFTP connection to site. You should be able to see the top level of the virtual directory tree configured on the server side.