ARC Next Generation Accounting Technical Details¶
New in version 6.4.
Information in this chapter is relevant only for 6.4+ ARC releases. If you are looking for the information about the technical details of legacy accounting subsystem in 6.0-6.3 ARC releases please read JURA Accounting Technical Details. Make sure you are reading the documentation that match your ARC CE release version.
General accounting configuration and operations flows are described in Accounting Subsystem. This section contains more technical details about implementation of each component of accounting subsystem.
Job accounting information processing workflow¶
Collecting the accounting information¶
The A-REX Accounting subsystem is part of the core A-REX functionality starting from the 6.4 release. The main functionality of A-REX Accounting subsystem is to handle the data writing to local SQLite accounting database for the every job state change.
The data sources of the Accounting data are the per-job files in the control directory:
.localfile contains general information associated with the job. All IDs, ownership, authtokenattributes are taken from this file. The data in
.localare written and updated by the A-REX JobControl modules.
.statisticsfile is a dedicated file written by the DTR data transfer framework that contains data transfer measurements.
.diagfile is written by the LRMS JobScript during the job execution on the Worker Node. It contains but is not limited to resource usage and worker node environment data.
The local SQLite accounting database contains all the A-Rex Accounting Record (AAR) data for every ARC CE job.
The initial record about the job is created based on the first ACCEPTED job event. The ID, ownership and submission time is recorded during this step and accounting job
status is marked as
Any subsequent job events triggers event data recording in the database, and allow to track data staging time, lrms enqueueing time, etc.
When the FINISHED job event occurs (execution is completed) the A-REX Accounting subsystem updates all AAR metrics in the database, storing resource usage, endtime, etc. Such a state is indicated by the
Using the local accounting database¶
Using the accounting data for statistics lookup and/or publishing to external services is accomplished via the developed
arc.control Python modules.
AccountingDBSQLite module is responsible for handling all low-level database operations and it hides SQL queries under the API needed for other workflows.
The records publishing is carried out by the
AccountingPublishing Python module that includes:
- classes for generating usage records in OGF.98 UR, EMI CAR 1.2, APEL Summaries and APEL Sync formats
- classes that handle the records POST-ing to SGAS server
- classes that trigger the APEL SSM usage for publishing data to APEL
- general wrapping classes to handle regular publishing and republishing of the data
arcctl accounting reublish tool and the
jura-ng tool (that runs regularly by A-REX) use the same
AccountingPublishing Python module.
The regular publishing process stores the last published record endtime in the dedicated
Publishing database. The next round of regular publishing queries the stored time and query the records since then.
Accounting data publishing details¶
Reporting to SGAS¶
SGAS has a simple custom web service interface loosely based on WS-ResourceProperties.
AccountingPublishing Python module uses the insertion method of this interface to report URs directly to the Python
httplib library with SSL context wrapping.
To increase communication efficiency the
SGASSender class sends URs in batches. SGAS accepts a batch of URs in a single request. The batch is an XML element called
UsageRecords, containing elements representing URs.
The maximal number of URs in a batch can be set as a urbatchsize configuration parameter of SGAS target.
Reporting to APEL¶
APEL uses the SSM framework for communication.
APELSender class implements integration with SSM python libraries developed by APEL.
ARC ships a minimal set of SSM libraries along with the A-REX binary packages to allow SSM usage. If SSM binary packages from APEL are availble for your OS (e.g. EL6), you can install these packages and they will be used instead of those shipped with ARC automatically.
Communicating messages with SSM relies on Directory Queue handled by the
dirq.QueueSimple Python module.
APELSender class has a set of
enqueue_* methods that put the generated records to SSM DirQ in the
send method then triggers the SSM workflow that get enqueued messages and send them to the APEL broker.
Reporting to APEL also works with sending records in batches. The default urbatchsize value is set to 1000 according to APEL recommendations.
Republishing simply triggers the same
AccountingPublishing classes for the defined timeframe that comes from the command line.
All records are regenerated from accounting database data and sent to the target.
The accounting directory
<controldir>/acconting is by default accessible only by the user running A-REX (root in most cases).
All usage records are submitted using the X.509 credentials specified by
the value of
x509_ set of configuration options of
No proxies are used for communication with accounting services.
The only access restriction made by a SGAS service is matching the Distinguished Name of the client (in this context ARC CE) with a set of trusted DNs. When access is granted, policies are then applied by SGAS, allowing either publishing and/or querying rights. Clients with publishing rights can insert any UR, regardless of content. By default, querying rights only allows retrieving URs pertaining to jobs submitted by the querying entity.
Publishing records to APEL requires that the
glite-APEL endpoint is defined for the grid-site in the GOCBG.
The ARC CE certificate DN should be added to the
Third-party accounting queries¶
However if you want to get a specific report or integrate ARC accounting database with third-party software you can of cause use SQLite directly.
The SQLite database file location is:
It is worth to be aware of the ARC Accounting Database Schema to develop third-party queries.
Definition of the A-REX Accounting Record including attribute mappings to SGAS and APEL¶
ARC CE is measuring and collecting a lot of accounting information needed but not limited to the data required by common aggregated accounting SGAS and APEL services.
All accounting information stored about a job is defined by what we called A-REX Accounting Record (AAR).
AARs has a representation inside the local accounting database according to schema and representations inside A-REX and Python modules.
Local stats are generated based on the stored AARs information and provides the way for on-site CE operations analyses.
The following tables include a flat list of the properties (NOT the database rendering) included into the AAR:
|A-REX Accounting Record (AAR)||SGAS OGF-UR||APEL CAR||Content description|
||The global unique jobid assigned by AREX.|
||LRMS job ID|
||User specified job name|
||The A-REX job submission endpoint URL used for this job|
|endpointtype||not used||not used||The A-REX job submission endpoint type used for this job|
||The LRMS behind A-REX|
||The name of the LRMS queue of the job|
||WN name(s) as given by LRMS separated by :|
||not used||Client connection socket from the client to A-REX|
||The global user identity, at the moment it is the SN from the certificate|
||The mapped local userid|
||contains the attributes of auth token (VOMS FAQNs in currect implementation)|
||User-defined name of the project the job belongs to|
||The terminal state of an A-REX job: aborted, failed, completed|
||The exit code of the payload in the LRMS|
||The timestamp of job acceptance at A-REX|
||The timestamp when the job reached the terminal state in A-REX|
||Number of allocated worker nodes|
||not used||Details of downloaded inputfile: url, size, transfer start, transfer end, downloaded from cache|
||not used||Details of uploaded outputtfile: url, size, transfer start, transfer end|
||Maximum virtual memory used by the job|
||Maximum resident memory used by the job|
||To be dropped from the AAR schema|
||The measured clocktime ellapsed during the execution of the job in the LRMS. No matter on how many cores, processors, nodes the user job ran on.|
||The total CPU time consumed by the job. If the job ran on many cores/processors/nodes, all separate consumptions shall be aggregated in this value.|
||The user part of the usedcputime|
||The kernel part of the usedcputime|
||The number of cores allocated to the job|
||The used size of scratch dir at the end of the job termination in the LRMS.|
|systemsoftware||The type and version of the system software (i.e. opsys, glibc, compiler, or the entire container wrapping the system software)|
||Coarse-grain characterization tag for the WorkerNode, e.g. BigMemory or t2.micro (aka Amazon instance type)|
|RTEs||List of used RTEs, including default ones as well.|
||The total volume of downloaded job input data in GBs|
|data-stagein-time||The time spent by the DTR sysem to download input data for the job|
||The total volume of uploaded job output data in GBs|
|data-stageout-time||The time spent by the DTR sysem to upload output data of the job|
|lrms-submission-time||The timestamp when the job was handed over to the LRMS system|
|lrmstarttime||The timestamp when the payload starts in the LRMS|
|lrmsendtime||The timestamp when the payload completed in the LRMS|
||The type and the corresponding benchmark value of the assigned WN|
|SGAS OGF-UR||APEL CAR|