This is a shortened installation manual; for details, please consult the ARC CE System Administrator Guide.
The NorduGrid's ARC middleware does not impose heavy requirements on hardware. Any 32-bit architecture will do, as well as many 64-bit ones. Some success has been reported for PPC, too. CPU frequency from 333 MHz and up has been tested, and at least 512MB of RAM is recommended. Disk space required for the ARC installation including development interface is about 160 MB, while external software (most notably, minimal setup of Globus Toolkit 5) requires another 10 MB. Network connectivity of servers (front-ends, gatekeepers, database servers, storage arrays etc) is required to be both out- and inbound. In case you are behind a firewall, a range of ports will have to be completely opened. For clusters/HPC resources, the worker nodes can either be on a private or a public network.
A shared file system, such as e.g. NFS, is desirable (due to simplicity) but not required, if the Local Resource Management System provides means for file staging between the computing nodes and the frontend, or if execution happens on the same machine (known as job forking). Local authentication of Grid users is supported through embedded authentication algorithms and callouts to external executables or functions in dynamically loadable libraries. Actual implementation (e.g., for AFS) may require site-specific modules.
The ARC middleware is expected to run on any system supported by Globus. At the moment, only GNU/Linux of the following distributions are supported for server side: Fedora, Red Hat Enterpise Linux, Debian, Ubuntu and (partially) OpenSuSE.
It is important to bear in mind that Grid requires presence of valid certificates on all servers, see security notes for more info.
In order for the authentication of a server's host certificate to be successful, the reverse DNS lookup of the IP address of the server must result in the hostname given in the host certificate.
This means that the reverse DNS lookup for a host running a GSI enabled service must be configured properly - a "host not found" result is not acceptable. When a server has several hostnames/aliases, the host certificate should be requested with the hostname that is used in the reverse lookup table in the DNS.
This reverse lookup must work for all clients trying to connect to the server, including utilities and tools running on the machine itself. Even if the host is a dedicated server and no user interface commands are being executed from it, other utilities require GSI authentication.
Since the hostname in the host certificate is fully qualified, the reverse lookup must yield the fully qualified hostname. If the /etc/hosts file is used for local lookups instead of DNS, make sure that the fully qualified hostname is listed before any shortnames or aliases for the server host.
If e.g. the /etc/hosts file of the server looks like this:
1.2.3.4 somename somename.domain.com
any tool running on that machine can NOT contact servers on the machine itself since the result of a reverse lookup will be the unqualified hostname "somename", which will not match the fully qualified hostname in the host certificate. Such a /etc/hosts file should be modified to read
1.2.3.4 somename.domain.com somename
Since authorization on the Grid relies on temporary delegated credentials (proxies), it is very important to synchronize the clock on your machines with a reliable time server. If the clock on a cluster is off by 3 hours, the cluster will either reject a newly created user proxy for the first 3 hours of its lifetime and then accept the proxy for 3 hours longer than it is supposed to, or start rejecting the proxy three hours too early, depending on in which direction the clock is off. The NTP protocol can be used to keep your clusters "on time".
Make your firewall Grid-friendly: there are certain incoming and outgoing ports and port ranges which need to be opened in case your Grid resource is behind a firewall. Globus-based Grid services, including those currently implemented in ARC, are not supported to work behind NAT firewalls.
ARC needs the following incoming and outgoing ports to be opened:
Most ports, including 2135 and 2811, are registered with IANA and should
normally not be changed. The ports for GridFTP data channels can be chosen arbitrary,
based on following considerations: gridftpd by default handles 100 connections
simultaneously; each connection should not use more than 1
additional TCP port. Taking into account that Linux tends to keep
ports allocated even after the handle is closed for some time, it
is a good idea to triple that amount. Hence about 300 data transfer ports
should be enough for the default configuration.
Typically, the range of ports from 9000 to 9300 is being opened.
Remember to specify this range in the ARC configuration file
([common] section, globus_tcp_port_range attribute) later on.
Default port 443 for web services might be restrictive, so it is possible to change that
as well. Common choices are 50000 or 60000. This is done by adding the line
arex_mount_point="https://your.host:60000/arex" to the [common] section in
the the ARC configuration file
Function | Location | Description | Example | Suggested permissions |
session directory (required) | NFS | the directory which accomodates the session directories of the Grid jobs | /scratch/grid | 755, owner grid, group grid |
cache directory (optional) | NFS/local | the place where the input files of the Grid jobs are cached | /scratch/cache | 755, owner grid, group grid |
runtime environment scripts (optional) | NFS or replicated on each LRMS' computing node |
the place for the initialization scripts of the pre-installed software environments | /SOFTWARE/runtime | 755, owner grid, group grid |
control directory (required) | local to the front-end | the directory for the internal control files of the grid-manager process | /var/spool/arc/jobstatus | 755, owner and group has to be root or the user specified in arc.conf ... [grid-manager] user="" ... |
Further notes on the Grid directories:
ARC comes with a very basic Storage Element: a GridFTP server, suitable for simple small-scale storage. Preparatory steps are rather trivial:
The same basic server software is needed both for Compute and Storage Elements.
All the required ARC software, as well as the necessary external packages, are available from the NorduGrid repositories. It is also available from standard Fedora and EPEL repositories, and most of it - from Ubuntu and Debian. NorduGrid distributes both source code and binary packages. Binaries are available as either RPMs, debs or tarballs.
For Linux users with system administrator privileges, recommended installation method is via use of repositories. The reason for this is complex dependencies of packages, which are best handled by repositories. Depending on your needs, you will need to enable at least one repository for each of the categories below:
The recommended way is to install the nordugrid-arc-compute-element meta-package, i.e., as root user, do: yum install nordugrid-arc-compute-element
Or on Debian-based systems: sudo apt-get install nordugrid-arc-compute-element
When meta-package is not available or not desirable, install the following (on RH-based systems): yum install nordugrid-arc-gridftpd nordugrid-arc-arex nordugrid-arc-aris nordugrid-arc-gridmap-utils nordugrid-arc-plugins-needed nordugrid-arc-plugins-globus nordugrid-arc-plugins-xrootd ca_policy_igtf-classic ca_policy_igtf-mics ca_policy_igtf-slcs fetch-crl
Or on Debian-based systems: sudo apt-get install nordugrid-arc-gridftpd nordugrid-arc-arex nordugrid-arc-aris nordugrid-arc-gridmap-utils nordugrid-arc-plugins-needed nordugrid-arc-plugins-globus ca-policy-igtf-classic ca-policy-igtf-mics ca-policy-igtf-slcs fetch-crl
In case you're going to request host certificate from the NorduGrid CA, install also nordugrid-arc-ca-utils and globus-gsi-cert-utils-progs.
It is also advised to install client tools, documentation, and development libraries (on RH-based systems): yum install nordugrid-arc-client nordugrid-arc-doc nordugrid-arc-devel
Or on Debian-based systems: sudo apt-get install nordugrid-arc-client nordugrid-arc-doc nordugrid-arc-devel
This step is only needed if you are using an unsupported operating system or different versions of external dependencies, and experience problems with ARC.
Detailed build instructions are given in the README file available with the distributed ARC source code.
The procedure below describes RPM build; it is very similar for deb:
There is a variery of options that can be specified with ./configure (e.g., to disable or enable specific components); use ./configure --help to obtain the complete and up-to-date list of such.
Read carefully the following section, as your resource will not be able to function if it has improper or outdated credentials.
The following considerations apply for compute elements, storage elements or any Grid service in general. You may find useful our certificate mini How-to.
Every Grid service or tool requires installation of public certificates of Certificate Authorities (CA). In case you do not have such installed yet, obtain them from either of the following providers:
These certificates are necessary to use international Grid infrastructures. Make sure your national CA certificates are always present. If your project makes use of own internal certificates, install them as well (contact your project support team for details). If there are some CAs that are banned by your local policies, make sure to remove their certificates from your computer.
Before installing any CA package, you are advised to check the credibility of the CA and verify its policy!The Certificate Authorities are responsible for maintaining lists of revoked personal and service certificates, known as CRL (Certificate Revocation List).
It is the site's (that is, yours) responsibility to check the CRLs regularly and deny access to Grid users presenting a revoked certificate. Outdated CRLs will render your site unuseable.
An automatic tool for regular CRL check-up fetch-crls is available in several Linux distributions and from NorduGrid repositories. When installed in crontab, the utility periodically keeps track of the CA revocation lists.
Your site needs to have certificates for the Grid services issued by your national Certificate Authority (CA). The minimum is a host certificate but we recommend to have a certificate for each service (e.g. LDAP) as well.
Each country has own certification policies and procedures, please consult your local Certificate Authority
In case your resource is in a Nordic country (Denmark, Finland, Norway, Iceland or Sweden), you may request the certificate from the install the NorduGrid CA. In some countries, Terena e-Science Service Certificate is preferred.
In order to create a request to the NorduGrid CA, install certrequest-config package from the NorduGrid Downloads area. This contains the default configuration for generating certificate requests for Nordic-based services and users. If you are located elsewhere, contact your local CA for details. For example, in Nordic countries, generate a host certificate request with
grid-cert-request -host <my.host.fqdn>and a LDAP certificate request (needed for either compute or storage element) with
grid-cert-request -service ldap -host <my.host.fqdn>and send the request(s) to the NorduGrid CA for signing.
Upon receipt of the signed
certificates, place them into the proper location (by default,
/etc/grid-security/).
Next step is the configuration of your resource. ARC
uses a single configuration file per host machine, independently of
the number and nature of services it hosts.
The default location
of this file is /etc/arc.conf. A different location can
be specified by the environment
variable ARC_CONFIG.
ARC provides several out-of-the-box configuration templates, that can be
just copied over /etc/arc.conf, and they will provide the system with all
the basic information to run.
These templates are installed with the nordugrid-arc-doc package (not automatically included with the
meta-package, so will need manual installation if not already done) and are
usually located in
/usr/share/doc/nordugrid-arc-doc/examples/ if installing from package,
$ARC_LOCATION/share/doc/examples/ if installing from source.
The templates for different ARC services can also be found on the Web.
A complete list of configuration options is available in the file $ARC_LOCATION/examples/arc.conf.reference
that comes with the distributed software (usually in /usr/share/arc/examples/arc.conf.reference).
However, this file must NOT be used
as a template file for starting the services, as it just includes all the possible options that may result in
unexpected server behaviour.
The configuration file consists of dedicated blocks for different services. If your host node runs only some of the services, unrelated blocks should be removed.
Not having a service block means not running the corresponding service on the resource.
For more details, see the configuration and authorisation section of the ARC CE System Administrator manual.
Make sure you configure your services to use the ports that are opened in the firewall. In particular, define globus_tcp_port_range="9000,9300" in the [common] section of arc.conf, or whatever range is opened in the firewall for gridftp data connections. The ports 2135 (LDAP) and 2811 (GridFTP) can be changed with the port="<port number>" option in the [infosys] and the [gridftpd] section of the arc.conf, respectively.
For authorization using Virtual Organizations, make sure you have one or more [vo] sections in arc.conf. These blocks should be configured to create user mappings in /etc/grid-security/grid-mapfile (the latter file name is configurable in arc.conf). Follow the configuration template and consult NorduGrid VO lists for detailed information.
All services can be run under a non-root account (configurable in the arc.conf). While for a Storage Element that only affects ownership of stored data, for a Compute Element the impact is more significant and some functionality is lost, so setting user=root in the [grid-manager] section is RECOMMENDED. Make sure that the host and service certificates are owned by the corresponding users (those in whose name the services are started).
LDAP Information System services are RECOMMENDED to run with the distribution's default "ldap" user (username may vary). This is the default if no user={username} option is specified in the [infosys] section.
If the Information System is running correctly, ldap entries for the cluster should show on the screen. If this doesn't work, it is useless to proceed to step 5, and a troubleshoot of the information system is needed. Common problems are firewall and security, so please read carefully that section before going to next step.