Measuring accounting metrics of the job
ARC CE has built-in capabilities to collect information about per-job resource consumption. This includes both ARC CE resources (e.g. data transfers, software environments) and worker nodes resources (e.g. CPU and memory usage). The full list of attributes stored in the A-REX Accounting Records (AAR) can be found in this document.
A-REX can use different methods (described below) to measure memory and CPU usage on the worker nodes, depending on their availability in the particular deployment case.
Measuring memory and CPU usage on the WN with cgroups
New in version 6.2.
When recent versions of GNU/Linux OS are used on the worker nodes the most precise and transparent way to measure all job workload
is to rely on the cgroups
kernel subsystem. Any systemd
-based Linux distribution relies on cgroups
heavily and they are already used.
Note
Some older operating systems may require mounting the cgroups tree explicitly.
For example in RHEL6 it can be easily done with libcgroup
:
[root ~]# yum install libcgroup
[root ~]# service cgconfig start
The benefit of using cgroups
is that everyting will be accounted.
Even if several payloads are executed (e.g. in pilot mode) or extra helper processes are spawned - the resource accounting will be accurate for the all workload done.
Enabling cgroups usage
To be able to use cgroups
for accounting ARC needs an extra tool installed on the worker nodes – the arc-job-cgroup
.
Based on the tool availability, the job script will or will NOT use cgroups
for measuring accounting metrics automatically.
The arc-job-cgroup
tool is available for the majority of OSes as a packaged binary build as a part of ARC distribution (nordugrid-arc-wn
package).
So the easiest way to install it is to use your package manager on the worker nodes, e.g.:
[root ~]# yum install nordugrid-arc-wn
If it is not possible to install the packaged version for some reason, it is easy to compile the pure C source code with standard C library calls only.
[root ~]# wget https://source.coderefinery.org/nordugrid/arc/raw/master/src/wn/arc-job-cgroup.c
[root ~]# cc -o arc-job-cgroup arc-job-cgroup.c
[root ~]# mv arc-job-cgroup /usr/local/bin/
[root ~]# chmod 4755 /usr/local/bin/arc-job-cgroup
How ARC operates cgroups
- The idea behind LRMS-independent
cgroup
-based resource usage measurements in ARC is to: create child cgroups for
memory
andcpuacct
controllersput the jobscript process into created cgroups (this will automatically catch all child processes)
collect the accounting data at the end of the jobscript
remove the child cgroup created at the beginning (moving all processes to parent cgroup)
If cgroups
are used in the Kernel, the process already belongs to some cgroup
. It can be either a root cgroup
(used for all processes) or some dedicated cgroup
created by LRMS with cgroups
support, container management system, etc.
All resources used by the child cgroup
are accounted in the parent cgroup
. Moreover all parent-defined limits are inherited and enforced as well.
So creating another child cgroup
in hierarchy is safe from all points of view.
Warning
Creating a child cgroup and put a task into it requires root
privileges. This is the reason behind the SUID bit for arc-job-cgroup
.
However the code itself is as simple as the mkdir
. You can review these 333 lines to reassure any possible fears.
If the arc-job-cgroup
tool is not available, the cgroups
tree is not mounted, or there are any other issues with cgroups
creation, the job script code falls back to the GNU time measurement method.
Measuring memory and CPU usage on the WN with GNU time
The GNU time
utility is capable of measuring and displaying information about the resources used by the executable it runs.
It is used as a part of the ARC-generated job script if found on the worker node.
Note
Changed in version 6.2.
In case of successful cgroups
usage, GNU time will NOT be used by job script.
Warning
The GNU time
is a separate binary typically installed by dedicated package. Do not mix it up with built-in version of time
in your shell (e.g. bash, zsh).
Typically you can install it with e.g. yum install time
or similar package management command.
For a non-standard location of GNU Time the gnu_time configuration option can be used to define it.
If the GNU time
utility is not available the job will run as it is and only LRMS-provided metrics will be accounted.
Using LRMS-provided metrics
After the job has finished execution in the LRMS, the batch system backend scan-script extracts accounting information about the job from the LRMS, either executing command line clients, parsing logs or using API.
The exact data measurements and the method of these data collection completely depends on the LRMS backend implementation and differs from one backend to another.
Common metrics include LRMSStartTime
and LRMSEndTime
. There are also typically some memory and CPU usage metrics available.