RunTime Environments in ARC6

Understanding RunTime Environments

ARC Computing Element is a front-end to the various heterogeneous resource providers. To run jobs on the particular resource provider there are always a set of software or workflow-specific paths, tools, libraries, environmental variables or even dynamic content that should be recreated in the job content.

To provide a flexible way of job runtime environment tuning, ARC enforces the concept of the RunTime Environment (RTE).

ARC RunTime Environments (RTEs) provide two features:

Advertising

indicate the available environment to be requested by end-users

Modifying job environment

flexibly contextualize job execution environment

Advertising RTEs

Advertising RTEs provides user interfaces to application software and other resources in a way that is independent of the details of the local installation of the application and computing platform (OS, hardware, etc.).

It addresses setups typically required by large research groups or user bases, dealing with a common set of software. The actual implementation of a particular RTE may differ from site to site as necessary.

However, it should be designed so that resource providers with different accounting, licence or other site-specific implementation details can advertise the same application interface (RTE) for all users.

Despite possibly different parameters or implementation, the same software addressed by the same RTE name should be known by community. It is also supported to add RTE versioning at the end of the RTE name (after a dash). The RTE version will be used for resource matchmaking along with the RTE name.

For example to request ENV/PROXY and APPS/HEP/ATLAS with version 20.1.0.1 or greater in the xRSL job description:

(runTimeEnvironment="ENV/PROXY")
(runTimeEnvironment>="APPS/HEP/ATLAS-20.1.0.1")

It is always up to the local system administrators to take a decision whether to install and enable a particular RTE or not.

Modifying job environment

The RTE content itself is a BASH script that is aimed to run any arbitrary code during the job life cycle.

The first argument of the RTE script indicates the so-called RTE stage. If the job description specifies additional arguments for corresponding RTE’s those are appended starting at the second position.

digraph { fontcolor = royalblue; node [shape=Rectangle, fontcolor=black]; subgraph clusterARCCE { rankdir=TB; label="ARC CE"; fontsize=16; jdrtes [label="RTEs from Job Description"]; default [label="Default RTEs", shape=folder]; rte0 [label="RTEs Stage 0", color=darkgreen, fillcolor=darkseagreen2, style=filled]; submit [label="Submit Batch Job"]; subgraph clusterJobScript { label="Job Script Generate"; fontsize=14; style="dashed"; color="red"; embedrte [label="Embeded RTEs contetnt"] } jdrtes -> default -> rte0 -> embedrte -> submit; } lrms [label="LRMS Scheduler", shape="oval", color="red"]; submit -> lrms [color="red", constraint=false]; subgraph clusterWorkNode { rankdir=TB; label="Worker Node"; fontsize=16; lrmsjob [label="Init LRMS Job"]; subgraph clusterWNJobScript { label="Job Script Run"; fontsize=14; rankdir=TB; style="dashed"; color="red"; scratch [label="Setup job workdir"]; rte1 [label="RTEs Stage 1", color=darkgreen, fillcolor=darkseagreen2, style=filled]; process [label="Job Processing"]; rte2 [label="RTEs Stage 2", color=darkgreen, fillcolor=darkseagreen2, style=filled]; } lrmsjob -> scratch -> rte1 -> process -> rte2; } lrms -> lrmsjob [constraint=false, color="red"]; }

There are 3 stages of an RTE execution:

Stage 0

RTE script sourced before the creation of the job’s LRMS submission script. In this case the scripts are run by A-REX on the frontend (ARC CE), before the job is sent to the LRMS. Some environment variables are defined in this case, and can be changed to influence the job’s execution later. TODO: list of grami attributes as a dedicated technical note

Stage 1

The Embedded RTE function runs before the main job processing on the Worker Node under the LRMS. Such stage can prepare the environment for some third-party software package. The current directory in this case is the one which would be used for execution of the job. The variable $HOME also points to this directory.

Stage 2

The embedded RTE function runs after the main job processing on the Worker Node under the LRMS. The main purpose is to clean possible changes done by Stage 1 (like removing temporary files).

You can use this template to start writing custom RTE script that fulfill your needs.

Operating RunTime Environments

From the release of ARC6, operating RunTime Environments is changed significantly and rely on ARC Control Tool:

digraph { rankdir=LR; fontcolor = royalblue; node [shape=folder, fontcolor=black]; systemrte [label="System-defined RTEs"]; userrte [label="User-defined RTEs"]; userrte2 [label="User-defined RTEs",style="dashed",color="grey",fontcolor="grey"]; comrte [label="Community-defined RTEs", href="../admins/details/rtes_community.html", target="_top"]; arcctl [label="arcctl", color="red", shape="oval", fontsize=16, fontcolor=red, href="../admins/commands/arcctl.html" target="_blank"]; systemrte -> arcctl; userrte -> arcctl; userrte2 -> arcctl [color="grey", style="dashed"]; comrte -> arcctl; subgraph clusterControlDir { label="Control Directory"; fontsize=16; node [shape=folder]; enabled [label="Enabled RTEs", href="../admins/detals/rtes.html#enabling-rtes", target="_top"]; default [label="Default RTEs", href="../admins/details/rtes.html#default-rtes", target="_top"]; params [label="RTE parameters", href="../admins/details/rtes.html#rte-parameters", target="_top", group=galign1]; } arcctl -> enabled; arcctl -> default; arcctl -> params; subgraph clusterInfosys { color="red"; fontsize=16; label="Information System"; publish [label="Advertise RTEs", shape="oval"] } subgraph clusterLRMS { color="red"; fontsize=16; label="LRMS"; jobscript [label="Embed to jobsript", shape="oval", group=galign1]; } enabled -> publish; enabled -> jobscript; default -> jobscript; params -> jobscript; }

Installing RTE scripts

There are set of System-defined RTEs pre-installed with the ARC CE packages that aim to fulfill common workflows.

An ARC CE administrator can add additional RTE directories (so-called User-defined RTEs). These additional places should be specified in arc.conf using the runtimedir configuration option in the [arex] block. Custom RTE scripts can be developed using this template as a starting point.

The Community-defined RTEs are additional RTEs created by research communities. These RTEs can be proviosioned to ARC CE from the trusted registries (including required software bundles) with ARC Control Tool.

Note

In ARC6 directories with RTE script are local to ARC CE and SHOULD NOT be shared with worker nodes

The RTE names used for advertising are implied by directory structure, e.g. in the ENV/PROXY the ENV is the directory inside the System RTEs location and PROXY is the name of the file.

Enabling RTEs

Installed RTEs should be enabled to be advertised and used during the job submission.

By name

To enable a particular RTE by name run the following command:

arcctl rte enable ENV/PROXY

By path

Especially if you have several RTEs with the same name installed, you can choose the exact one by specifying the filesystem path:

arcctl rte enable /usr/share/arc/rte/ENV/PROXY

Using wildcards

To enable several RTEs you can pass as many names as you want to the arcctl command. Additionally you can use glob (man 7 glob) wildcards in RTE names.

The following command will enable all APPS/HEP/ATLAS RTEs for SLC6 builds:

arcctl rte enable APPS/HEP/ATLAS-*-X86_64-SLC6-*

Note

It is advised to use wildcards to enable all user-defined RTEs during ARC5 to ARC6 upgrade

Dummy RTEs

In case you need RTEs only for advertising but no need for script content, you can create a Dummy RTE for the specified name. The following command enables an APPS/MYAPP RTE with empty contents:

arcctl rte enable APPS/MYAPP --dummy

An example of dummy RTEs could be APPS/ATLAS-SITE used by the ATLAS experiment for sites to adversise that this indeed is an ATLAS-SITE.

Default RTEs

Default RTEs aim to address the workflows when advertising and implicit request in the job description is not needed, however modification of every submitted job (adjusting memory, setting LRMS scratch, etc) is required on the resource provider.

Installed RTEs can be selected for default inclusion to the job lifecycle with the following ARC Control Tool command:

arcctl rte default ENV/LRMS-SCRATCH

This will transparently add ENV/LRMS-SCRATCH to each job and will be executed the same way as Enabled RTEs. A default RTE does not need to be enabled to be executed, but it can be enabled if one wants to publish it in addition to executing it.

Note

You can use the same by-name, by-path and wildcard techniques as for enabling

RTE Parameters

To handle heterogeneity of resource providers, some RTEs can be parametrized.

For example, the system-defined ENV/PROXY RTE that transfers the delegated proxy-certificate to the worker node can optionally transfer CA certificate directories. This optional part is controlled by COPY_CACERT_DIR parameter.

To check if an RTE contains parameters and their default values, run the:

[root ~]# arcctl rte params-get ENV/PROXY
COPY_CACERT_DIR=Yes

You can also see the description and allowed values adding the --long keyword.

To set an RTE parameter value, the following command is used:

arcctl rte params-set ENV/PROXY COPY_CACERT_DIR No

List available RTEs and their status

To view the summary of all installed, enabled and default RTEs run:

[root ~]# arcctl rte list
<output omitted>
APPS/HEP/ATLAS-20.8.0-X86_64-SLC6-GCC48-OPT (user, enabled)
APPS/HEP/ATLAS-20.8.1-X86_64-SLC6-GCC48-OPT (user, enabled)
APPS/HEP/ATLAS-20.8.2-X86_64-SLC6-GCC49-OPT (user, enabled)
<output omitted>
ENV/LRMS-SCRATCH                 (system, default)
ENV/PROXY                        (system, masked, disabled)
ENV/PROXY                        (user, enabled)
ENV/RTE                          (system, disabled)
ENV/RUNTIME/ALIEN-2.17           (user, enabled)
VO-biomed-CVMFS                  (dummy, enabled)

The first tag describe the RTE origin (system, user or dummy). The following tags shows the status.

The special masked keyword indicates that the RTE name is used more that once and by-name operations will apply to another RTE script. For example ENV/PROXY will be enabled from the user-defined location as the system-defined is masked. However it is possible to enable a masked RTE by specifying its path.

Listing the particular kind of RTEs (e.g. enabled) is possible with the appropriate argument (see ARC Control Tool for all available options):

[root ~]# arcctl rte list --enabled
<output omitted>
APPS/HEP/ATLAS-20.8.2-X86_64-SLC6-GCC49-OPT
ENV/PROXY
ENV/RUNTIME/ALIEN-2.17
VO-biomed-CVMFS

The long listing allows to get the detailed pointers to RTEs locations and descriptions:

[root ~]# arcctl rte list --long
System pre-defined RTEs in /usr/share/arc/rte:
    ENV/PROXY                        # copy proxy certificate to the job session directory
    ENV/RTE                          # copy RunTimeEnvironment scripts to the job session directory
    ENV/LRMS-SCRATCH                 # enables the usage of local to WN scratch directory defined by LRMS
User-defined RTEs in /etc/arc/rte:
    ENV/RUNTIME/ALIEN-2.17           # RTE Description is Not Available
    ENV/PROXY                        # RTE Description is Not Available
Enabled RTEs:
    ENV/RUNTIME/ALIEN-2.17           -> /etc/arc/rte/ENV/RUNTIME/ALIEN-2.17
    ENV/PROXY                        -> /etc/arc/rte/ENV/PROXY
Default RTEs:
    ENV/LRMS-SCRATCH                 -> /usr/share/arc/rte/ENV/LRMS-SCRATCH

View RTE content

Dumping the content of an RTE that will be embedded to job script is possible with the cat action:

[root ~]# arcctl rte cat ENV/LRMS-SCRATCH
SCRATCH_VAR="LOCALTMP"
# description: enables the usage of local to WN scratch directory defined by LRMS
# param:SCRATCH_VAR:string:WORKDIR:Variable name that holds the path to job-specific WN scratch directory

SCRATCH_VAR="${SCRATCH_VAR:-WORKDIR}"

if [ "x$1" = "x0" ]; then
  RUNTIME_LOCAL_SCRATCH_DIR="\${${SCRATCH_VAR}}"
fi

Disable and Undefault RTEs

Enabled RTEs can be disabled running:

arcctl rte disable ENV/PROXY

The similar operation for default RTEs is called undefault:

arcctl rte undefault ENV/LRMS-SCRATCH

Note

You can use the same by-name, by-path and wildcard techniques as for enabling

System-defined RunTime Environments shipped with ARC

ENV/PROXY

Export delegated credentials (proxy certificate) to the job’s session directory. Optionally copies CA certificates directory from ARC CE to session directory.

Sets the X509_USER_PROXY, X509_USER_CERT and X509_CERT_DIR to make files instantly available to client tools.

Parameters:

  • COPY_CACERT_DIR = Yes/No - If set to Yes, CA certificate directory will be copied to the session directory along with proxy certificate. Default is Yes.

  • USE_DELEGATION_DB = Yes/No - If set to Yes RTE will try to extract proxy certificate from A-REX delegation DB (works in limited number of cases). Default is No.

ENV/RTE

Copy RunTime Environment scripts to the job session directory for some workloads that require the files themselves instead of embedding the RTEs to the jobscript.

Designed to be used as default RTE.

Has no parameters.

ENV/LRMS-SCRATCH

Many resource providers use scratchdir to move files to the local worker node disk before running the job.

When the local scratch directory is created dynamically by LRMS (e.g. in the job prologue) and then cleaned up automatically after the job completion, the ENV/LRMS-SCRATCH is needed. The scratch place should be indicated by some environmental variable that holds a path to such LRMS-defined scratch directory.

This RTE is designed to be used as a default RTE to enable this optional functionality.

Parameters:

  • SCRATCH_VAR = name - Variable name that holds the path to job-specific WN scratch directory. Default is WORKDIR.

  • TMPDIR_LOCATION = path - Define the TMPDIR path on WN. Variable names can be used as a part of the path, e.g. '$WORKDIR/tmp'.

Note

The ENV/LRMS-SCRATCH is not needed if the scratch directory used and created on the worker node is of type /<arc-conf-scratchdir-var>/<arc-job-id>. It is only needed if the folder should be of type <path defined by LRMS in SCRATCH_VAR>/<arc-job-id>, i.e. a LRMS defined path is included.

ENV/CONDOR/DOCKER

ARC HTCondor backend supports submission to the Docker universe. This RTE enables this feature on-demand.

The RTE can be use by end-users when enabled. The RTE argument defines the Docker image name to be used, e.g:

(runtimeenvironment="ENV/DOCKER" "debian")

The RTE can also be used as default RTE to enforce Docker universe submission for any job. The Docker image should be set with the RTE parameter.

Parameters:

  • DOCKER_IMAGE = name - Docker image to use for submitted jobs by default.

ENV/SINGULARITY

A general-purpose RTE that allows to run the submitted jobscript inside a defined singularity container image.

This RTE is designed to be used as both a default and enabled RTE.

When enabled, a user can specify the singularity image as an RTE argument:

(runtimeenvironment="ENV/SINGULARITY" "mysoftwareimage.sif")

Parameters:

  • SINGULARITY_IMAGE = images - Define singulatiry images to de used with a job. It accepts comma separated vo:path pairs, where vo defines the virtual organization with image located at path. The special default value for vo defines the image used by default (if specific image for VO is not defined). The special NULL value for path skips singularity usage.

  • SINGULARITY_OPTIONS = options - Define additional options for singularity executable. In particulat additional storage areas (e.g. CVMFS, CA certificates) can be specified to be mounted here.

ENV/CANDYPOND (experimental)

Makes available the arccandypond tool for use inside the job script on the Worker Nodes (including necessary environmental variables for it’s operation).

Note

The CandyPond service itself should be enabled (defining the [arex/ws/candypond] block) on ARC CE as well.

Parameters:

  • CANDYPOND_URL = url - Redefine the URL of CandyPond service (default is auto – ARC CE URL used for job submission will be used automatically)