The ARC Cache Index (ACIX)

The ARC Cache Index (ACIX) is a catalog of locations of cached files.

It consists of two components, one on the computing resource: the ACIX Scanner, and the ACIX Index which indexes the cache locations retrieved from the ACIX Scanners. These components can be found respectively in the packages nordugrid-arc-acix-scanner and nordugrid-arc-acix-index. They both depend on a third package, nordugrid-arc-acix-core.

ACIX Scanner

The ACIX Scanner periodically scans the A-REX cache and constructs a Bloom filter of cache content. This filter is a way of representing the cache content in an extremely compressed format, which allows fast query of any element of the filter and efficient upload of the content to an index server.

This type of compression however has the possibility of giving false-positives, i.e. a certain file may appear to be present in the cache according to the filter when it is not. The ACIX Scanner runs in an HTTPS server and the filter is accessible at the endpoint https://hostname:5443/data/cache.

It scans the caches specified in the A-REX arc.conf. It does not require any configuration but some options can be changed and it is important to make sure the ACIX Scanner port (default 5443) is open in the firewall.

ACIX Index

The ACIX Index server runs independently of the ACIX Scanner and A-REX, but can be deployed on the same host as both of them. It is configured with a list of ACIX Scanners and periodically pulls the cache filter from each one. It runs within an HTTPS server through which users can query the cached locations of files. Configuration uses the regular arc.conf file in the [acix-index] block. Here ACIX Scanners are specified by the cachescanner option. For example:

[acix-index]
cachescanner = https://my.host:5443/data/cache
cachescanner = https://another.host:5443/data/cache

The ACIX Index server can be queried at the endpoint https://hostname:6443/data/index and the list of URLs to check are given as comma-separated values to the option “url” of this URL, e.g:

https://hostname:6443/data/index?url=http://www.nordugrid.org:80/data/echo.sh,\
 http://my.host/data1

A JSON-formatted response is returned, consisting of a dictionary mapping each URL to a list of locations. If remote access to cache is configured as described above then the location will be the endpoint at which to access the cached file, for example https://a-rex.host/a-rex/cache. If not then simply the hostname will be returned.

Using ACIX with A-REX Data Staging

ACIX can be used as a fallback mechanism for A-REX downloads of input files required by jobs by specifying use_remote_acix in the [arex/data-staging] block of arc.conf, e.g.:

[arex/data-staging]
use_remote_acix = https://cacheindex.ndgf.org:6443/data/index

If a download from the primary source fails, A-REX can try to use any cached locations provided in ACIX if the cache is exposed at those locations. In some cases it may even be preferred to download from a close cache rather than Grid storage and this can be configured using the preferredpattern configuration option which tells A-REX in which order to try and download replicas of a file.

Using ACIX for ARC client brockering

ACIX can also be used for data-based brokering for ARC jobs. An ACIX-based broker plugin written in Python comes packaged with the ARC client tools (in $ARC_LOCATION/share/arc/examples/PythonBroker/ACIXBroker.py) and can be used for example with:

[user ~]$ arcsub -b PythonBroker:ACIXBroker.ACIXBroker:https://cacheindex.ndgf.org:6443/data/index

Target sites for job submission are ranked in order of how many input files required by the job are cached there. See the comments inside this Python file for more information.

Deployment use-case

../../_images/ACIX.png

Fig. 21 ACIX deployment scenario, with one global ACIX Index and a local ACIX Index for CE 1a and CE 1b.

Fig. 21 shows an example ACIX set up. Each CE runs a ACIX Scanner and there is a central ACIX Index server which pulls content from all CEs. In addition there is one site with two CEs, CE 1a and CE 1b.

In order to do data-based brokering on just those two sites (and ease the load on the global ACIX Index server), a local ACIX Index is running which pulls content from only these two sites. In such a setup if may be desired to prefer to dowload data from the cache on CA 1a to CE 1b and vice versa, so those CEs could be configured with the Local ACIX Index server as the use_remote_acix and each other’s hostname first in preferredpattern.