The ARC Cache Index (ACIX)
The ARC Cache Index (ACIX) is a catalog of locations of cached files.
It consists of two components, one on the computing resource: the ACIX Scanner,
and the ACIX Index which indexes the cache locations
retrieved from the ACIX Scanners. These components can be found
respectively in the packages nordugrid-arc-acix-scanner
and
nordugrid-arc-acix-index
. They both depend on a third package,
nordugrid-arc-acix-core
.
ACIX Scanner
The ACIX Scanner periodically scans the A-REX cache and constructs a Bloom filter of cache content. This filter is a way of representing the cache content in an extremely compressed format, which allows fast query of any element of the filter and efficient upload of the content to an index server.
This type of compression however has the possibility of
giving false-positives, i.e. a certain file may appear to be present in
the cache according to the filter when it is not. The ACIX Scanner runs
in an HTTPS server and the filter is accessible at the endpoint
https://hostname:5443/data/cache
.
It scans the caches specified in
the A-REX arc.conf
. It does not require any configuration but some
options can be changed and it is important to make sure
the ACIX Scanner port (default 5443) is open in the firewall.
ACIX Index
The ACIX Index server runs independently of the ACIX Scanner and A-REX, but
can be deployed on the same host as both of them. It is configured with
a list of ACIX Scanners and periodically pulls the cache filter from
each one. It runs within an HTTPS server through which users can query
the cached locations of files. Configuration uses the regular arc.conf
file in the [acix-index] block. Here ACIX Scanners are
specified by the cachescanner option. For example:
[acix-index]
cachescanner = https://my.host:5443/data/cache
cachescanner = https://another.host:5443/data/cache
The ACIX Index server can be queried at the endpoint
https://hostname:6443/data/index
and the list of URLs to check are
given as comma-separated values to the option “url” of this URL, e.g:
https://hostname:6443/data/index?url=http://www.nordugrid.org:80/data/echo.sh,\
http://my.host/data1
A JSON-formatted response is returned, consisting of a dictionary
mapping each URL to a list of locations. If remote access to cache is
configured as described above then the location will be the endpoint at
which to access the cached file, for example
https://a-rex.host/a-rex/cache
. If not then simply the hostname will
be returned.
Using ACIX with A-REX Data Staging
ACIX can be used as a fallback mechanism for A-REX downloads of input files required by jobs by specifying use_remote_acix in the [arex/data-staging] block of arc.conf, e.g.:
[arex/data-staging]
use_remote_acix = https://cacheindex.ndgf.org:6443/data/index
If a download from the primary source fails, A-REX can try to use any cached locations provided in ACIX if the cache is exposed at those locations. In some cases it may even be preferred to download from a close cache rather than Grid storage and this can be configured using the preferredpattern configuration option which tells A-REX in which order to try and download replicas of a file.
Using ACIX for ARC client brockering
ACIX can also be used for data-based brokering for ARC jobs. An
ACIX-based broker plugin written in Python comes packaged with the ARC
client tools (in $ARC_LOCATION/share/arc/examples/PythonBroker/ACIXBroker.py
) and can be used for example with:
[user ~]$ arcsub -b PythonBroker:ACIXBroker.ACIXBroker:https://cacheindex.ndgf.org:6443/data/index
Target sites for job submission are ranked in order of how many input files required by the job are cached there. See the comments inside this Python file for more information.
Deployment use-case

Fig. 21 ACIX deployment scenario, with one global ACIX Index and a local ACIX Index for CE 1a and CE 1b.
Fig. 21 shows an example ACIX set up. Each CE runs a ACIX Scanner and there is a central ACIX Index server which pulls content from all CEs. In addition there is one site with two CEs, CE 1a and CE 1b.
In order to do data-based brokering on just those two sites
(and ease the load on the global ACIX Index server), a local ACIX Index is
running which pulls content from only these two sites. In such a setup
if may be desired to prefer to dowload data from the cache on CA 1a to
CE 1b and vice versa, so those CEs could be configured with the Local
ACIX Index server as the use_remote_acix
and each other’s hostname first in
preferredpattern
.