Tutorial Part 2 - Production setup with local ARC datadelivery

Tutorial configuration values

In the “zero-conf” setup in the previous step no other user then your self can submit jobs to ARC, and the job will only run as a fork. There are minimally 5 steps to configure ARC for a production ready server. These are:

Step 1. Tell ARC what LRMS system you are using
Step 2. Tell ARC what are the paths to the session directory, control directory and compute scratch directory
Step 3. Tell ARC what LRMS queues to use
Step 4. Enable ARC data-staging service and tell ARC what cache directory to use
Step 5. Set up the authentication and authorization including external to local user mapping rules

That will be it. And we will finally test the setup by submitting a couple of test-jobs .

We will be using some example ARC configuration parameter values suited to fit with the test-infrastructure set up in relation to the EGI 2023 ARC tutorial. You will naturally have to replace these values with the actual values for your specific system. Here is an overview of these for convenience:

ARC session directory: /grid/session1
ARC control directory: /grid/control
ARC cache directory: /grid/cache1
batch system (LRMS): slurm or condor
name of batch system queues: arc-tutorial and wlcg
compute node scratch directory: /scratch

Simple overview of file setup for production setup of ARC example cluster — Overview of server and file system for ARC example cluster

Step 1. Tell ARC what LRMS system you are using

In this tutorial we either user slurm or the htcondor batch systems.

Simply replace the “zero-conf” fork lrms with slurm or condor like this:

[lrms]
lrms = slurm

Or if you are on HTCondor:

[lrms]
lrms = condor

ARC supports many other batch-systems - for details see arc.conf reference: [lrms]

Step 2. A-REX directories - [arex] block

In a production system we would want to decide where the ARC related files and folders should live. The relevant A-REX directories are at least the control-directory, the session-directory and the cache directory for data-delivery service. We will adress the two first ones here, and the cache directory in Step 4

Controldir

The control directory holds A-REX internal job metadata files and is local to the ARC-CE (i.e. not shared with the compute nodes). The default value is /var/spool/arc/jobstatus. We will instead point to /grid/control in this example, which correspond to the folders prepared on the test-servers if you are following along as part of an ARC workshop. If you are setting up your clusters you use the dedicated path for the ARC control directory that you have prepared (or just keep the default). ARC will automatically create the directory if it does not already exist. There can only be one single control directory.

Example of the control directory file tree that gets created by ARC-CE:

/grid/control/
├── accepting
├── accounting
├── delegations
├── dhparam.pem
├── dtr.state
├── finished
├── gm.fifo
├── gm.heartbeat
├── info.xml
├── jobs
│   └── cc4
│       └── 197
│           └── f52
│               └── 3c6
│                   ├── description
│                   ├── diag
│                   ├── errors
│                   ├── grami
│                   ├── input
│                   ├── input_status
│                   ├── local
│                   ├── output
│                   ├── proxy
│                   ├── statistics
│                   └── xml
├── log
├── processing
├── restarting
└── version2

Visit Internal files of A-REX for more details.

Sessiondir

The jobs working directories are created as subdirectories of the ARC session directories. Each job has its dedicated subfolder session directory identified with the ARC job id. They hold all the input, output, executables and log-file for the job. There can be multiple ARC session directories. The default value is /var/spool/arc/sessiondir but we will change it to /grid/session1 in this example to demonstrate a non-default value. ARC will automatically create the directory if it does not already exist.

Example of the file tree of the session directory where you see two jobs present with their corresponding working directories and files.

/grid/session1/
├── cc4197f523c6
│   ├── runhello.sh
│   ├── output.text
│   ├── stderr
│   └── stdout
├── cc4197f523c6.comment
├── cc4197f523c6.diag
├── 9b57049fdc51
│   ├── runhello_inputfile.sh
│   ├── output.text
│   ├── localfile
│   ├── remotefile
│   ├── stderr
│   └── stdout
├── 9b57049fdc51.comment

This is how the arex block now looks after adding our systems directories:

[arex]
controldir = /grid/control
sessiondir = /grid/session1

For more details, see arc.conf reference: controldir and sessiondir.

Configure a local compute scratch directory

To avoid exessive traffic on the shared filesystem it is adviced to configure a local scratch directory on the compute nodes. If a scratch directory is configured, A-REX will move the jobs files from the shared session directory onto the local scratch directory on the compute node once the job starts. The only file that is not moved is the jobs log-file, this is simply soft-linked to the compute-node. Note that if the ARC cache is used, the input-file links are simply moved to the scratch directory. The outputfiles are created in the compute nodes scratch directory while the job is running.

Once the job is finished, the files including the output-files are moved back into the jobs session directory.

We configure ARC for using a local scratch directory on the compute nodes by adding the scratchdir directory path to the [arex] block. In this example the path used is /scratch. The [arex] block now look like this:

[arex]
controldir = /grid/control
sessiondir = /grid/session1
scratchdir = /scratch

For more details about the configuration option, see arc.conf reference: scratchdir. In addition, for details on how the job scratch area works, see: Job scratch area.

Step 3. Tell ARC what LRMS queues to use - [queues] block

Each queue on the cluster that should receive jobs from ARC must be represented and described by a separate queue block. The queue_name should be used as a label in the block name. In case of fork, or other LRMSes with no queue names, just use any unique string. In the case of SLURM the queue_name must match an existing SLURM partition.

There can be several queues, for instance a queue for the ordinary grid jobs, and a separate queue for low priority preemteable grid jobs, or for instance a separate queue for different experiments.

The queues can also be configured to accept jobs from certain authgroups, which we will see in Step 5-3. In the example shown here, all authgroups can use all queues as no authgroups are specified.

[queue: arc-tutorial]
comment = Queue for ARC test jobs

[queue:wlcg]
comment = Queue for WLCG jobs

Consult the arc.conf reference [queue] for more details.

Step 4. Enable ARC data-staging service and tell ARC what cache directory to use

With the ARC data-staging enabled, ARC will make sure to fetch any remote input files to the ARC server, and the job will be forwarded to the LRMS system only once all files are downloaded.

The ARC cache is a must when you have data-staging enabled. The datadelivery service will check the ARC cache to see if the file is already downloaded, and if it is, will not perform a new download, but simply link the file to the current job using it.

Example of the file tree of the cache directory is shown below. We see the data folder holding the actual data files, in addition to the joblinks folder identified by the ARC id where the hardlinks to the cached data files are stored.

/grid/cache1/
├── data
│   ├─ 15
│   │  └── ed291ec90b64fe7c153c74f65af09f0db2b70e
│   │  └── ed291ec90b64fe7c153c74f65af09f0db2b70e.meta
│   └──31
│      └── b9f10eb0968c50246534619de8c78e42a46156
│      └── b9f10eb0968c50246534619de8c78e42a46156.meta
└── joblinks
    └── cc9197f523c6
        └── remotefile

To configure the cache and the data-staging, right below the [arex] block add the follow (close to minimal) configuration:

[arex/cache]
cachedir = /grid/cache1

[arex/data-staging]
logfile=/var/log/arc/datastaging.log

Note

We enable the central datastaging.log here, otherwise, the datataging logs will only reside on any remote data delivery servers, and it is convenient to have them accessible from the ARC-CE itself.

It is important also to set up cache-cleaning by enabling the see [arex/cache/cleaning] block. However, for the sake of simplicity, we will not do that in this tutorial.

Check the [arex/cache] and [arex/data-staging] for details.

Step 5. Set up authentication and authorization

The authentication and authorization consists of 3 steps

Step 5-1: Define authentication groups so-called authgroups
Step 5-2: Define mapping rules
Step 5-3: Enable authorization of authgroups to services and queues

Note

Rules in an authgroup are processed in order of appearance. The first matching rule decides the membership of the user to the authgroup being evaluated and the processing STOPS WITHIN that authgroup. A user can be matched to more than one authgroup. For the mapping rules, the first successful match is used by default, but this can be configured.

Confer with the arc.conf reference for details on the [authgroup] and the [mapping] configuration blocks.

Step 5-1: Define authentication groups so-called authgroups - [authgroups] block

Now that the configuration matches the cluster setup with batch system type, folder structure and datastaging, it is time to set up authentication and authorization, in order for the right users to be allowed to submit jobs to the cluster. At the moment only you are allowed to submit jobs according to ARC as result of the “zero-conf” setup.

The authentication and authorization functionality in ARC is very powerful and you can define the authgroups as fine- or coarse-grained as needed. Here we will show a few examples of some possibilities.

Up until now we only have the authgroup connected with the test-jwt with corresponding access rules in the /etc/arc.conf.d/10-testjwt-<hash>.conf file, our example was:

[root ~]# cat /etc/arc.conf.d/10-testjwt-a7374e17.conf
[authgroup:testjwt-a7374e17]
authtokens = * https://arc.example.org/arc/testjwt/8b7baf79 * * *

[mapping]
map_to_user = testjwt-a7374e17 nobody:nobody

[arex/ws/jobs]
allowaccess = testjwt-a7374e17

For the purpose of this tutorial we will keep using the access configuration produced by test-jwt, and also the host certificate issued by the test-CA. But we will make a change to the mapping: change the mapped user from the nobody to arcuser (or any other user that exists on the whole compute cluster):

[mapping]
map_to_user = testjwt-93397e33 arcuser:arcuser

Note

In real production you will remove any configuration settings related to the test-CA or test-jwt with sudo arcctl test-jwt cleanup sudo arcctl test-ca cleanup

We can add some relevant OIDC token claims (add any other of relevance to you):

[authgroup: wlcg_iam]
authtokens = * https://wlcg.cloud.cnaf.infn.it/ * * *

[authgroup: egi_aai]
authtokens = * https://aai.egi.eu/oidc/ * * *

In addition to some other relevant authgroups based on x509 and voms (add any other of relevance to you):

[authgroup:atlas-admin]
subject=/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=ddmadmin/CN=531497/CN=Robot: ATLAS Data Management

[authgroup:cms]
voms=cms * * *

Note

Each authgroup can contain multiple authorization rules and any combination of authorization methods. For instance can the wclg_iam authgroup used as an example here, contain both authtokens rule and voms rules.

It can be conventient to group the authgroups, an example is shown here:

[authgroup:wlcg]
authgroup = wlcg_iam
authgroup = egi_aai
authgroup = atlas-admin
authgroup = cms

Another example could be:

[authgroup:x509]
authgroup = atlas-admin
authgroup = cms

[authgroup:tokens]
authgroup = wlcg_iam
authgroup = egi_aai

In other words - group the authgroups in the way you find convenient for the mapping and the authorisation to services.

Now that we have defined some authgroups which authenticate users coming with either tokens from WLCG or EGI, or the x509 robot user certificate of ATLAS Data Management, or an x509 user certificate that has encoded belonging to the cms VO - we are ready to define the mapping for these different user groups.

Step 5-2: Define mapping rules

With the needed authgroups defined, next we need to map the authgroups to a user on the system. ARC will not allow any job to run as root.

In this tutorial we will map all authgroups to the same user for simplicity. This user must exist on the ARC compute element as well as on the compute nodes (which it does for the test infrasctucture if you are attending in the context of an ARC workshop). In this example the linux user and groupname is arcuser.

[mapping]
map_to_user = wlcg arcuser:arcuser

Other popular mapping methods are the map_to_pool and map_to_plugin methods.

The job will be submitted to the batch system by this user, and hence run on the compute node as this user.

Step 5-3: Enable authorization of authgroups to services and queues

The final step in the authentication and authorization setup is to tell ARC what authgroups are allowed to do what. By using allowacess and denyaccess you can control what interfaces and queues a certain authgroup is allowed to use.

In the case no allowaccess is added, ARC allows anyone authenticated to access this service/queue.

We will set up some very simple auhorization in this tutorial.

[arex/ws/jobs]
allowaccess = wlcg

We can also specify access of authgroups to particular queues e.g.

[queue: arc-tutorial]
comment = Queue for ARC test jobs

[queue:wlcg]
comment = Queue for WLCG jobs
allowaccess = wlcg

Note

In the case of SLURM - the SLURM partitions must correspond to the queue names. If a job is submitted without specifying a queue (more on that below), the default queue of the batch system will be used.

Our full arc.conf now looks like this (merging arc.conf with files in arc.conf.d folder)

[common]
x509_host_key = /etc/grid-security/testCA-hostkey.pem
x509_host_cert = /etc/grid-security/testCA-hostcert.pem


[authgroup:testjwt-a7374e17]
authtokens = * https://arc.example.org/arc/testjwt/8b7baf79 * * *

[authgroup: wlcg_iam]
authtokens = * https://wlcg.cloud.cnaf.infn.it/ * * *

[authgroup: egi_aai]
authtokens = * https://aai.egi.eu/oidc/ * * *

[authgroup:atlas-admin]
subject=/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=ddmadmin/CN=531497/CN=Robot: ATLAS Data Management
voms = atlas * lcgadmin *

[authgroup:cms]
voms=cms * * *

[authgroup:wlcg]
authgroup = wlcg_iam
authgroup = egi_aai
authgroup = atlas-admin
authgroup = cms

[authgroup:x509]
authgroup = atlas-admin
authgroup = cms

[authgroup:tokens]
authgroup = wlcg_iam
authgroup = egi_aai


[mapping]
map_to_user = wlcg arcuser:arcuser
map_to_user = testjwt-93397e33 arcuser:arcuser

[lrms]
lrms=slurm

[arex]
controldir=/grid/control
sessiondir=/grid/session1
scratchdir=/scratch

[arex/cache]
cachedir=/grid/cache1

[arex/data-staging]
logfile=/var/log/arc/datastaging.log

[arex/ws]
[arex/ws/jobs]
allowaccess=wlcg
allowaccess = testjwt-a7374e17

[infosys]
[infosys/glue2]

[infosys/cluster]
nodeaccess=outbound

[queue: arc-tutorial]
comment = Queue for ARC test jobs

[queue:wlcg]
comment = Queue for WLCG jobs
allowaccess = wlcg

Restart ARC after configuration changes

For any configuration changes, ARC must be restarted. Before proceeding, do

$ arcctl service restart -a

Submit a job using tokens

Get a token

Generate your token, and store it as an environment variable on the ARC client-machine. We will use the ARC test-jwt token in this tutorial. But if you have a real token, you can use that as long as you also configure ARC to accept it - see step5_auth.

export BEARER_TOKEN=$(arcctl test-jwt token)

In this tutorial we will submit two test-jobs, one without any datastaging, and one with datastaging. If you are using the ARC client machine or one of the ARC test-sites set up in context with an ARC workshop, the test-job descriptions are already present in the /home/almalinux/arcjob_tutorial folder.

Note

If you are on your own client server (and not a client in connection to a live tutorial workshop), please download and untar the files (job description, input-files, executable) - or use your own custom test-jobs. Note that to use your own remote ARC client you must ensure that the ARC-CE accepts the token that you will use, and that the tokens issuer CA is installed on both server and client. If you are using a test-jwt on the client Follow the steps here to set up the required trust between the ARC-CE and ARC client. If you have a real IGTF compliant host certificate on the ARC client, the previous steps deploying the igtf-ca’s on the ARC-Ce should be all you need to do: arcctl deploy igtf-ca classic.

Submit a job with no inputdata

In this example we are assuming you are submitting the job on the client installed on the ARC-CE server itself. Replace $(hostname) with your ARC-CE endpoint if this is not the case.

The executable looks like this:

#!/bin/sh
echo "Am sleeping for 60 seconds"
sleep 60
exit

And the job-description looks like this:

&(executable=runhello.sh)
(jobname=hello_test)
(stdout=std.out)
(stderr=std.err)
(cpuTime="1 hours")
(memory="1000")
(outputfiles=("output.txt" ""))

Submit the job (make sure your token is still valid, if not, generate a new one - see token generation).

arcsub -d DEBUG  -C $(hostname) hello.xrsl

Submit a job with local and remote inputdata

Now let’s submit a job that requires ARC to stage data for us. We will use this executable:

#!/bin/sh
echo "Computing md5sum for localfile";md5sum localfile>>output.txt;
echo "Computing md5sum for remotefile";md5sum remotefile>>output.txt;
echo "Am sleeping for 60 seconds"
sleep 60
exit

And this job-description file:

&(executable=runhello_inputfile.sh)
(jobname=hello_test)
(stdout=std.out)
(stderr=std.err)
(cpuTime="1 hours")
(memory="1000")
(inputfiles=
("localfile" "file.txt")
("remotefile" "http://download.nordugrid.org/repos/6/rocky/9/source/updates/repodata/repomd.xml"))
(outputfiles=("output.txt" ""))

Submit the job (make sure your token is still valid, if not, generate a new one - see token generation)

arcsub -d DEBUG  -C $(hostname) hello_inputfile.xrsl

Inspecting the job

Let us inspect what happens when the job runs in the production setup

The controldir, sessiondir, cachedir are used as configured - we see files populated as the jobs are running. We also see that the owner of the files in the sessiondir and cachedir is the arcuser as configured.

[almalinux@arc.example.org ~]$ ls -lhrt /grid/session1/
total 28K
-rw-------. 1 arcuser arcuser 274 May 24 15:53 9be514eedc51.comment
drwx------. 2 arcuser arcuser  54 May 24 15:54 9be514eedc51

[almalinux@arc.example.org ~]$ ls -lhrt /grid/cache1/
total 0
drwxr-xr-x. 3 root root 16 May 24 15:24 data
drwxr-xr-x. 2 root root  6 May 24 16:04 joblinks

[almalinux@arc.example.org ~]$ ls -lhrt /grid/cache1/joblinks/
total 0
drwx------. 2 arcuser arcuser 24 May 24 16:12 9be514eedc51

[almalinux@arc.example.org ~]$  ls -lhrt /grid/control/jobs/
total 0
drwxr-xr-x. 4 root root 28 May 24 15:53 9be

A few seconds after the job is submitted with arcsub - we can check Slurm to see if the job is running:

[almalinux@arc.example.org ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 8 arc-tutor hello_te  arcuser  R       0:02      1 c1

Indeed, running as with slurmid 8 in the arc-tutorial partition and as arcuser as expected, and on the compute node c1.

Logging into the compute node c1 we can check the scratch folder for this job:

[almalinux@c1 ~]$ sudo ls -lhrt /scratch/9be514eedc51/
total 12K
-rw-------. 1 arcuser arcuser  33 May 24 15:53 localfile
-rwx------. 1 arcuser arcuser 197 May 24 15:53 runhello_inputfile.sh
lrwxrwxrwx. 1 arcuser arcuser  45 May 24 15:53 remotefile -> /grid/cache1/joblinks/9be514eedc51/remotefile
lrwxrwxrwx. 1 arcuser arcuser  35 May 24 15:53 std.out -> /grid/session1/9be514eedc51/std.out
lrwxrwxrwx. 1 arcuser arcuser  35 May 24 15:53 std.err -> /grid/session1/9be514eedc51/std.err
-rw-------. 1 arcuser arcuser  89 May 24 15:53 output.txt

On the compute node, the job is running in the /scratch directory, and files are owned by the arcuser as configured
We see that the file that ARC staged from a remote source is saved in the ARC cachedir and is linked in /scratch
We see that the jobs output/error files std.out and std.err still live in the ARC-CE sessiondir as expected, ARC does not move these files
We see that the outputfile produced by the job executable: output.txt is created in the scratch directory
Once the job is finished all files will be moved back to the ARC-CE sessiondirectory.

Tutorial Part 2 - Production setup with local ARC datadelivery

Tutorial configuration values

Step 1. Tell ARC what LRMS system you are using

Step 2. A-REX directories - [arex] block

Controldir

Sessiondir

Configure a local compute scratch directory

Step 3. Tell ARC what LRMS queues to use - [queues] block

Step 4. Enable ARC data-staging service and tell ARC what cache directory to use

Step 5. Set up authentication and authorization

Step 5-1: Define authentication groups so-called authgroups - [authgroups] block

Step 5-2: Define mapping rules

Step 5-3: Enable authorization of authgroups to services and queues

Restart ARC after configuration changes

Submit a job using tokens

Get a token

Submit a job with no inputdata

Submit a job with local and remote inputdata

Inspecting the job

Navigation