ARC Computing Element Installation and Configuration Guide

Prerequisites

Choosing the host

It is assumed that ARC CE is installed on top of an existing Linux computing cluster. Many Linux distributions are supported. ARC works well also on a complete virtual computing cluster environment in a cloud.

ARC is non-intrusive towards existing systems. We suggest to deploy ARC CE on a dedicated (virtual) machine connected to the cluster network and filesystem.

ARC software is very lightweight and does not require powerful machines to run, however if ARC CE will perform data transfers the requirements are higher. As a minimum, a production CE with 4 cores and 8GB of RAM should be capable of handling up to 10,000 concurrent jobs without problems. One CE can easily handle the load of a single cluster, however multiple CEs may be deployed in parallel for redundancy.

Plan for storage areas

Several storage areas are necessary for job submission, execution and data storing. You should mount/export following directories:

session directory

data staging cache directory (if planned)

decide to what extent to use NOT cross-mounted scratch directory on the worker nodes

Session directory (and the cache directory if used) is typically cross-mounted NFS share. Please note, that in the typical setup when A-REX is running as root NFS share need to be exported with no_root_squash.

Local resource management system (LRMS)

Install and configure your LRMS (batch system). ARC supports a variety of LRMS back-ends:

fork - fork jobs on the ARC CE host node, not a cluster. Targeted for testing and development but not for real production workloads.

condor - uses HTCondor-powered HTC resource

slurm - for SLURM clusters

pbs - any flavor of PBS batch system, including Torque and PBSPro

pbspro - dedicated Altair PBS Professional backend (from 6.1 release)

ll - Load Leveler batch system

lsf - Load Sharing Facility batch system

sge - Oragle Grid Engine (formely Sun Grid Engine)

boinc - works as a gateway to BOINC volunteer computing resources

Start by checking if you are able to submit jobs to the chosen LRMS from the ARC CE host.

You may consider setting up dedicated queues to use with ARC CE (e.g. per-VO queues).

Please also NOTICE that in some cases (depending on LRMS) you need to share the batch system log directories with ARC CE.

Configure OS accounts

Plan for local account(s) (or account pools) that will be used to execute jobs on the worker nodes.

These accounts should be also available on the ARC CE node.

Please note that ARC services are ran as root on the ARC CE node and switch to an appropriate local account when processing job data staging and job execution. This process is called mapping.

Installation

This section assumes you have already enabled the NorduGrid repositories for your package utility (yum/dnf/apt).

Note

If you are using RHEL-based operating systems, ARC can be directly installed from the EPEL repository.

Please note that in EPEL-7 nordugrid-arc-* packages delivers ARC 5. Use nordugrid-arc6-* to install ARC 6 from EPEL-7.

Install ARC CE core packages from repositories:

[root ~]# yum -y install nordugrid-arc-arex
or
[root ~]# apt-get install nordugrid-arc-arex

Any extra packages will be installed based on the ARC configuration file with ARC Control Tool as described below. Full list of packages to install manually (especially additional plugins) can be found here.

Grid security heavily relies on PKI and all actions requires certificates/keys for ARC CE as a service and users:

for testing purposes, the ARC Test-CA and host certificate signed by the Test-CA are generated during A-REX installation.

for production use please obtain a certificate signed by one of the IGTF accredited CAs and remove Test-CA files with arcctl test-ca cleanup.

In production ARC CE needs IGTF CA certificates deployed to authenticate users and other services, such as storage elements. To deploy IGTF CA certificates to ARC CE host, run [1]:

[root ~]# arcctl deploy igtf-ca classic

Warning

If you are on an EL9 type server (CentOS-Stream 9, AlmaLinux 9, Rocky 9, Fedora 9) you need to allow legacy crypto policies to be compatible with IGTF. On the command line of the ARC-CE server, issue:

update-crypto-policies --set LEGACY

Configuration

Configuration of ARC CE can be done by means of modifying the pre-shipped zero configuration available at /etc/arc.conf.

The purpose of this zero configuration is to offer a minimalistic working computing element out-of-the box right after package installation with zero additional configuration needed.

For production deployment you will need to customize the configuration in accordance to your actual setup and operations mode.

Note

ARC services must be restarted when changes have been made to arc.conf.

The ultimate information about available configuration options can be found in the ARC Configuration Reference Document which is also available locally as /usr/share/doc/nordugrid-arc-*/arc.conf.reference.

The most common configuration steps are explained below.

Configure authorization and mapping rules

Authorization rules define who can access the computing element (execute jobs, query info, etc). Mapping rules define which grid-users are mapped to which system accounts.

Both authorization and mapping rules in ARC6 rely on the concept of authgroups. Each authgroup represents a set of users, whose identities are matched to configured rules.

Once defined, authgroups can be applied to filter access to the CE per interface ([arex/ws/jobs], [gridftpd/jobs]) and/or per-queue.

The allowaccess and/or denyaccess options in the corresponding block define which authgroups are allowed to access the interface or submit to the queue.

The [mapping] block used to configure the rules that defines how the particular authgroup members are mapped to OS accounts.

In the shipped zero configuration the [authgroup: zero] is defined and applied to A-REX WS interface, the effect of which is to deny any access unless user is listed in the testCA.allowed-subjects file. The mapping is configured with map_to_user rule that assign the same nobody account to everyone in zero authgroup.

The typical configuration looks like this:

[authgroup: atlas]
voms = atlas * * *

[mapping]
map_to_pool = atlas /etc/grid-security/pool/atlas

[gridftpd/jobs]
allowaccess = atlas

[queue: qatlas]
allowacces = atlas

Please read the Authorization, Mapping and Queue selection rules document to get familiar with all aspects of this important configuration step.

Provide LRMS-specific information

One more critical configuration step is to supply ARC CE with relevant information regarding you LRMS specifics.

Specify you LRMS type

In the arc.conf there is a dedicated [lrms] block that defines the type of your LRMS, as well as several options related to the LRMS behaviour. For example, to instruct ARC to use SLURM, use the following configuration:

[lrms]
lrms = slurm
slurm_use_sacct = yes

Specify queues

In addition to specifying LRMS itself, it is necesssary to list all the queues that will be exposed via the ARC CE, by using [queue: name] blocks.

[queue: atlas]
comment = Queue for ATLAS jobs

More information about configuring particular LRMS to work with ARC can be found in Batch systems support document.

Configure A-REX Subsystems

The ARC Resource-coupled EXecution service (A-REX) is a core service handling execution and entire life cycle of compute jobs.

Enable job management interfaces

A-REX has several job management interfaces avaliable. One can control which of them are enabled and exposed by configuring the corresponding blocks

WS Interfaces (EMI-ES and ARC REST): [arex/ws/jobs]
Gridftp: [gridftpd/jobs]
Internal: Install nordugrid-arc-plugins-internal package to use this interface.

Enable data services

ARC has a built-in data transfer framework called DTR. It was designed to be used in environments in which data transfer was not possible or not desirable on the worker nodes such as HPC centres or sites without local storage.

DTR relies on users submitting jobs with pre-defined input and output files. When A-REX receives a job, it takes care of downloading the specified input files to the job’s session directory, then submits the job to the batch system. After the batch job finishes, A-REX takes care of uploading any output files to grid storage.

Define the [arex/data-staging] block to enable data-staging capabilities. Data transfers can be scaled out using multi-host data-staging.

DTR also includes a cacheing capability. If cacheing is enabled then A-REX will download all input files to the cache, and create symlinks from the session directory for each file. If a job requests a file that is already cached, A-REX will not download it again, but simply link from the existing cache file. Define the [arex/cache] block to enable cacheing.

More detailed technical documentation on ARC data features and advanced features such as ACIX and CandyPoind can be found in the data overview pages.

RunTime Environments

RunTime Environments can modify the job execution cycle and are used for advertising software or features offered by the computing facility.

ARC ships several RTEs that are ready to be used and classified as system-defined.

One can add ones own directories with so-called user-defined RTEs using the runtimedir configuration option in the [arex] block.

In ARC6, both system- and user-defined directories are local to the ARC CE node and SHOULD NOT be shared to worker nodes (unlike in ARC 5).

To use an installed RTE, one should additionally enable this RTE with ARC Control Tool. For example, to enable the system-defined ENV/PROXY RTE, run:

[root ~]# arcctl rte enable ENV/PROXY

More details on operating RunTime Environments can be found in RunTime Environments in ARC6.

Information system

ARC CE information system aims to collect and publish informaion to be used by special clients for matchmaking and/or monitoring the state and stats of the resource.

It is mandatory to configure the information system for production cases, like those of the WLCG computing infrastructure.

Defining general information

There are many information schemas and renderings of data available to comply to existing standards. There are several blocks that are used to define published information depending on schemas:

[infosys]: The most common block that enables internal information collection from ARC CE host and LRMS.
[infosys/cluster]: The common information about the whole cluster, including e.g. its capacity.
[queue: name]: For heterogeneous clusters, most of the information in the [infosys/cluster] block can be re-defined on per-queue basis.
[infosys/glue2]: Configures the GLUE2-specific values and enables internal glue2 rendering.
[infosys/ldap]: Enables LDAP/BDII dedicated services to publish information via LDAP protocol.
[infosys/glue2/ldap]: Enables GLUE2-schema LDAP rendering of the information.
[infosys/nordugrid]: Enables LDAP rendering of the information according to the NorduGrid schema.
[infosys/glue1]: Configures the GLUE1.x-schema specific values and enables LDAP rendering of GLUE1.x.
[infosys/glue1/site-bdii]: Enables and configures GLUE1.x site-bdii functionality.

Accounting

ARC CE has built-in functionality to measure job’s resource usage metrics that can be used for analyses and publishing to the SGAS and APEL centralized accounting services.

New in version 6.4: ARC 6.4 introduced the next generation accounting subsystem: A-REX store a complete job accounting data permanently in the local SQLite accounting database. Local accounting database is used as a powerful analyses instrument as a part of ARC Control Tool functionality and to generate standard-complient usage records to publish data to SGAS and APEL.

Deprecated since version 6.4: In 6.0-6.3 releases the Job Usage Reporter of ARC (JURA) tool creates standard-complient usage records from job usage information provided by the A-REX Job Log files, send the records to remote accounting services and optionally archive the records for future analyses and republishing.

If you need to configure accounting follow the accounting guide.

Configure Firewall

Different ARC CE services open a set of ports that should be allowed in the firewall configuration.

To generate iptables configuration based on arc.conf, run:

[root ~]# arcctl deploy iptables-config

Enable and Run Services

To enable and run all services as configured in arc.conf, run:

[root ~]# arcctl service enable --as-configured --now

Instead of using ARC Control Tool to manage ARC services, you can always use your OS native tools.

Test Basic Functionality

To test some basic job submission to the configured ARC CE, follow the instructions provided in the Try ARC6: towards distributed computing in a few minutes.