ARC Computing Element Installation and Configuration Guide
Prerequisites
Choosing the host
It is assumed that ARC CE is installed on top of an existing Linux computing cluster. Many Linux distributions are supported. ARC works well also on a complete virtual computing cluster environment in a cloud.
ARC is non-intrusive towards existing systems. We suggest to deploy ARC CE on a dedicated (virtual) machine connected to the cluster network and filesystem.
ARC software is very lightweight and does not require powerful machines to run, however if ARC CE will perform data transfers the requirements are higher. As a minimum, a production CE with 4 cores and 8GB of RAM should be capable of handling up to 10,000 concurrent jobs without problems. One CE can easily handle the load of a single cluster, however multiple CEs may be deployed in parallel for redundancy.
Plan for storage areas
Several storage areas are necessary for job submission, execution and data storing. You should mount/export following directories:
data staging cache directory (if planned)
decide to what extent to use NOT cross-mounted scratch directory on the worker nodes
Session directory (and the cache directory if used) is typically cross-mounted NFS share. Please note, that in the typical setup when A-REX is running as root
NFS share need to be exported with no_root_squash
.
Local resource management system (LRMS)
Install and configure your LRMS (batch system). ARC supports a variety of LRMS back-ends:
fork - fork jobs on the ARC CE host node, not a cluster. Targeted for testing and development but not for real production workloads.
condor - uses HTCondor-powered HTC resource
slurm - for SLURM clusters
pbs - any flavor of PBS batch system, including Torque and PBSPro
pbspro - dedicated Altair PBS Professional backend (from 6.1 release)
ll - Load Leveler batch system
lsf - Load Sharing Facility batch system
sge - Oragle Grid Engine (formely Sun Grid Engine)
boinc - works as a gateway to BOINC volunteer computing resources
Start by checking if you are able to submit jobs to the chosen LRMS from the ARC CE host.
You may consider setting up dedicated queues to use with ARC CE (e.g. per-VO queues).
Please also NOTICE that in some cases (depending on LRMS) you need to share the batch system log directories with ARC CE.
Configure OS accounts
Plan for local account(s) (or account pools) that will be used to execute jobs on the worker nodes.
These accounts should be also available on the ARC CE node.
Please note that ARC services are ran as root on the ARC CE node and switch to an appropriate local account when processing job data staging and job execution. This process is called mapping.
Installation
This section assumes you have already enabled the NorduGrid repositories for your package utility (dnf/apt).
Note
If you are using RHEL-based operating systems, ARC can be directly installed from the EPEL repository.
Please note that in EPEL the packages are named nordugrid-arc7`.
Warning
If you are on an EL9 type server (CentOS-Stream 9, AlmaLinux 9, Rocky 9, RHEL9) you need to allow legacy crypto policies to be compatible with IGTF. On the command line of the ARC-CE server, issue:
update-crypto-policies --set LEGACY
Install ARC CE core packages from repositories as root:
dnf -y install nordugrid-arc-arex
or for Debian based systems as root:
apt-get install nordugrid-arc-arex
Any extra packages will be installed based on the ARC configuration file with ARC Control Tool as described below. Full list of packages to install manually (especially additional plugins) can be found here.
Grid security heavily relies on PKI and all actions requires certificates/keys for ARC CE as a service and users:
for testing purposes, the ARC Test-CA and host certificate signed by the Test-CA are generated during A-REX installation.
for production use please obtain a certificate signed by one of the IGTF accredited CAs and remove Test-CA files with
arcctl test-ca cleanup
.
In production ARC CE needs IGTF CA certificates deployed to authenticate users and other services, such as storage elements. To deploy IGTF CA certificates to ARC CE host, run as root [1]:
arcctl deploy igtf-ca classic
Configuration
Configuration of ARC CE can be done by means of modifying the pre-shipped zero configuration available at /etc/arc.conf
.
The purpose of this zero configuration is to offer a minimalistic working computing element out-of-the box right after package installation with zero additional configuration needed.
For production deployment you will need to customize the configuration in accordance to your actual setup and operations mode.
Note
ARC services must be restarted when changes have been made to arc.conf.
The ultimate information about available configuration options can be found in the ARC Configuration Reference Document which is also available locally as /usr/share/doc/nordugrid-arc-*/arc.conf.reference
.
The most common configuration steps are explained below.
Provide LRMS-specific information
One more critical configuration step is to supply the ARC CE with relevant information regarding you LRMS specifics.
Specify you LRMS type
In the arc.conf
there is a dedicated [lrms] block that defines the type of your LRMS, as well as several options related to the LRMS behaviour.
For example, to instruct ARC to use SLURM, use the following configuration:
[lrms]
lrms = slurm
slurm_use_sacct = yes
Specify queues
In addition to specifying LRMS itself, it is necessary to list all the queues that will be exposed via the ARC CE, by using [queue: name] blocks.
[queue: atlas]
comment = Queue for ATLAS jobs
More information about configuring particular LRMS to work with ARC can be found in Batch systems support document.
Configure A-REX Subsystems
The ARC Resource-coupled EXecution service (A-REX) is a core service handling execution and entire life cycle of compute jobs.
Enable job management interfaces
In ARC 7 A-REX has only 1 main job management interface available, namely the ARC REST interface. To enable it, the block must be configured:
- WS Interface ARC REST
In addition there is an experimental internal job submission interface. To use it, you must install the nordugrid-arc-plugins-internal
package.
Enable data services
ARC has a built-in data transfer framework called DTR. It was designed to be used in environments in which data transfer was not possible or not desirable on the worker nodes such as HPC centres or sites without local storage.
DTR relies on users submitting jobs with pre-defined input and output files. When A-REX receives a job, it takes care of downloading the specified input files to the job’s session directory, then submits the job to the batch system. After the batch job finishes, A-REX takes care of uploading any output files to grid storage.
Define the [arex/data-staging] block to enable data-staging capabilities. Data transfers can be scaled out using multi-host data-staging.
DTR also includes a cacheing capability. If cacheing is enabled then A-REX will download all input files to the cache, and create symlinks from the session directory for each file. If a job requests a file that is already cached, A-REX will not download it again, but simply link from the existing cache file. Define the [arex/cache] block to enable cacheing.
More detailed technical documentation on ARC data features and advanced features such as CandyPond can be found in the data overview pages.
RunTime Environments
RunTime Environments can modify the job execution cycle and are used for advertising software or features offered by the computing facility.
ARC ships several RTEs that are ready to be used and classified as system-defined.
One can add ones own directories with so-called user-defined RTEs using the runtimedir configuration option in the [arex]
block.
In ARC, both system- and user-defined directories are local to the ARC CE node and SHOULD NOT be shared to worker nodes.
To use an installed RTE, one should additionally enable this RTE with ARC Control Tool.
For example, to enable the system-defined ENV/PROXY
RTE, run:
[root ~]# arcctl rte enable ENV/PROXY
More details on operating RunTime Environments can be found in RunTime Environments in ARC.
Information system
ARC CE information system aims to collect and publish information to be used by special clients for matchmaking and/or monitoring the state and stats of the resource.
It is mandatory to configure the information system for production cases, like those of the WLCG computing infrastructure.
Defining general information
There are many information schemas and renderings of data available to comply to existing standards. There are several blocks that are used to define published information depending on schemas:
[infosys]
The most common block that enables internal information collection from ARC CE host and LRMS.
[infosys/cluster]
The common information about the whole cluster, including e.g. its capacity.
[queue: name]
For heterogeneous clusters, most of the information in the
[infosys/cluster]
block can be re-defined on per-queue basis.[infosys/glue2]
Configures the GLUE2-specific values and enables internal glue2 rendering.
[infosys/ldap]
Enables LDAP/BDII dedicated services to publish information via LDAP protocol.
[infosys/glue2/ldap]
Enables GLUE2-schema LDAP rendering of the information.
[infosys/nordugrid]
Enables LDAP rendering of the information according to the NorduGrid schema.
Accounting
ARC CE has built-in functionality to measure job’s resource usage metrics that can be used for analyses and publishing to the SGAS and APEL centralized accounting services.
If you need to configure accounting follow the accounting guide.
Configure Firewall
Different ARC CE services open a set of ports that should be allowed in the firewall configuration.
To generate iptables configuration based on arc.conf
, run as root:
arcctl deploy iptables-config
Enable and Run Services
To enable and run all services as configured in arc.conf
, run as root:
arcctl service enable --as-configured --now
Instead of using ARC Control Tool to manage ARC services, you can always use your OS native tools.
Test Basic Functionality
To test some basic job submission to the configured ARC CE, follow the instructions provided in the try_arc.