A-REX Data Cache technical description

Structure of the cache directory

Cached files are stored in sub-directories under the data directory in each main cache directory. Filenames are constructed from an SHA-1 hash of the URL of the file and split into subdirectories based on the two initial characters of the hash. In the extremely unlikely event of a collision between two URLs having the same SHA-1 hash, caching will not be used for the second file.

When multiple caches are used, a new cache file goes to a randomly selected cache, where each cache is weighted according to the size of the file system on which it is located.

For example: if there are two caches of 1TB and 9TB then on average 10% of input files will go to the first cache and 90% will go to the second cache.

Some associated metadata including the corresponding URL and an expiry time, if available, are stored in a file with the same name as the cache file, with a .meta suffix.

For example, with a cache directory /cache the file srm://srm.nordugrid.org/grid/atlas/file1:

is mapped to /cache/data/37/b19accc950c37876a61d2de6e238d38c9e94c0,

the file /cache/data/37/b19accc950c37876a61d2de6e238d38c9e94c0.meta contains the original URL and an expiry time if one is available.

At the start of a file download, the cache file is locked, so that it cannot be deleted and so that another download process cannot write the same file simultaneously. This is done by creating a file with the same name as the cache filename but with a .lock suffix. This file contains the process ID of the process and the hostname of the host holding the lock. If this file is present, another process cannot do anything with the cache file and must wait until the cache file is unlocked (i.e. the .lock file no longer exists). The lock is continually updated during the transfer, and is considered stale if 15 minutes have passed since the last update. These stale locks, caused for example by a download process exiting abnormally, will therefore automatically be cleaned up. Also, if the process corresponding to the process ID stored inside the lock is no longer running on the host specified in the lock, it is safe to assume that the lock file can be deleted. If a file is requested which already exists in the cache (and is not locked), the cache file is not locked, but checks are done at the end of cache processing to ensure the file was not modified during the processing.

How the cache works

If a job requests an input file which can be cached or is allowed to be cached, it is stored in the selected cache directory, then a hard link is created in a per-job directory, under the joblinks subdirectory of the main cache directory. Then depending on the configuration, either the hard-link is copied or soft-linked to the SD. The former option is advised if the cache is on a file system which will suffer poor performance from a large number of jobs reading files on it, or the file system containing the cache is not accessible from worker nodes. The latter option is the default option. Files marked as executable in the job will be stored in the cache without executable permissions, but they will be copied to the SD and the appropriate permissions applied to the copy.

The per-job directory is only readable by the local user running the job, and the cache directory is readable only by the A-REX user. This means that the local user cannot access any other users’ cache files. It also means that cache files can be removed without needing to know whether they are in use by a currently running job. However, as deleting a file which has hard links does not free space on the disk, cache files are not deleted until all per-job hard links are deleted.

Warning

If a cache is mounted from an NFS server and the A-REX is run by the root user, the server must have the no_root_squash option set for the A-REX host in the /etc/exports file, otherwise the A-REX will not be able to create the required directories.

Note

Note that when running A-REX under a non-privileged user account, all cache files will be owned and accessible by the same user, and therefore modifiable by running jobs. This is potentially dangerous and so caching should be used with caution in this case.

If the file system containing the cache is full and it is impossible to free any space, the download fails and is retried without using caching.

Before giving access to a file already in the cache, the A-REX contacts the initial file source to check if the user has read permission on the file. In order to prevent repeated checks on source files, this authentication information is cached for a limited time. On passing the check for a cached file, the user’s DN is stored in the .meta file, with an expiry time equivalent to the lifetime remaining for the user’s proxy certificate. This means that the permission check is not performed for this user for this file until this time is up (usually several hours). File creation and validity times from the original source are also checked to make sure the cached file is fresh enough. If the modification time of the source is later than that of the cached file, the file will be downloaded again. The file will also be downloaded again if the modification date of the source is not available, as it is assumed the cache file is out of date. These checks are not performed if the DN is cached and is still valid.

The A-REX checks the cache periodically if it is configured to do automatic cleaning. If the used space on the file system containing the cache exceeds the high water-mark given in the configuration file it tries to remove the least-recently accessed files to reduce size to the low water-mark.

Cache cleaning

When [arex/cache/cleaner] block is defined the cache is cleaned automatically periodically (every 5 minutes) by the A-REX to keep the size of each cache within the configured limits. Files are removed from the cache if the total size of the cache is greater than the configured limit. Files which are not locked are removed in order of access time, starting with the earliest, until the size is lower than the configured lower limit. If the lower limit cannot be reached (because too many files are locked, or other files outside the cache are taking up space on the file system), the cleaning will stop before the lower limit is reached.

Since the limits on cache size are given as a percentage of space used on the filesystem on which the cache is located, it is recommended that each cache has its own dedicated file system.

If the cache shares space with other data on a file system, the option calculatesize=cachedir should be set in arc.conf so that the cache limits are applied on the size of the cache rather than the file system.

With large caches mounted over NFS and an A-REX heavily loaded with data transfer processes, cache cleaning can become slow, leading to caches filling up beyond their configured limits. For performance reasons it may be advantageous to disable cache cleaning by the A-REX, and run the cache-clean tool (usually /usr/libexec/arc/cache-clean) independently on the machine hosting the file system.

Caches can be added to and removed from the configuration as required without affecting any cached data, but after changing the configuration file, the A-REX should be restarted. If a cache is to be removed and all data erased, it is recommended that the cache be put in a draining state until all currently running jobs possibly accessing files in this cache have finished. In this state the cache will not be used by any new jobs, but the hard links in the joblinks directory will be cleaned up as each job finishes. Once this directory is empty it is safe to delete the entire cache.

Caches may also be marked as read-only, so that data cached there can be used by new jobs, but no new data will be written there. This feature is available from ARC version 6.7. Note that read-only caches are not cleaned by A-REX.

Exposing the Cache

Normally the ARC cache is internal to the CE and is not exposed to the outside. However it may be beneficial to allow reading cache files, if for example the file is lost from Grid storage or as a fallback when Grid storage is down. This can be done via HTTPS through the A-REX web services interface.

Specifying [arex/ws/cache] block opens remote read access to certain cache files for certain credential properties. When configured this allows cached files to be read from the A-REX WS endpoint, for example if file gsiftp://my.host/file1 is cached at CE a-rex.host the file is accessible (if credentials allow) at:

https://a-rex.host/arex/cache/gsiftp://my.host/file1

Since remote reading can increase the load on A-REX, the number of concurrent requests should be limited. This can be done using the max_data_transfer_requests configuration option.

Indexing the Cache content

The ARC Cache Index (ACIX) provides a way to discover locations of cached files.