ARC SDK
Public Member Functions
Arc::FileCache Class Reference

FileCache provides an interface to all cache operations. More...

#include <arc/data/FileCache.h>

Public Member Functions

 FileCache (const std::string &cache_path, const std::string &id, uid_t job_uid, gid_t job_gid)
 Create a new FileCache instance with one cache directory. More...
 
 FileCache (const std::vector< std::string > &caches, const std::string &id, uid_t job_uid, gid_t job_gid)
 Create a new FileCache instance with multiple cache dirs. More...
 
 FileCache (const std::vector< std::string > &caches, const std::vector< std::string > &remote_caches, const std::vector< std::string > &draining_caches, const std::string &id, uid_t job_uid, gid_t job_gid)
 Create a new FileCache instance with multiple cache dirs, remote caches and draining cache directories. More...
 
 FileCache ()
 Default constructor. Invalid cache. More...
 
bool Start (const std::string &url, bool &available, bool &is_locked, bool use_remote=true, bool delete_first=false)
 Start preparing to cache the file specified by url. More...
 
bool Stop (const std::string &url)
 Stop the cache after a file was downloaded. More...
 
bool StopAndDelete (const std::string &url)
 Stop the cache after a file was downloaded and delete the cache file. More...
 
std::string File (const std::string &url)
 Get the cache filename for the given URL. More...
 
bool Link (const std::string &link_path, const std::string &url, bool copy, bool executable, bool holding_lock, bool &try_again)
 Link a cache file to the place it will be used. More...
 
bool Release () const
 Release cache files used in this cache. More...
 
bool AddDN (const std::string &url, const std::string &DN, const Time &expiry_time)
 Store a DN in the permissions cache for the given url. More...
 
bool CheckDN (const std::string &url, const std::string &DN)
 Check if a DN exists in the permission cache and is still valid for the given url. More...
 
bool CheckCreated (const std::string &url)
 Check if it is possible to obtain the creation time of a cache file. More...
 
Time GetCreated (const std::string &url)
 Get the creation time of a cached file. More...
 
bool CheckValid (const std::string &url)
 Check if there is an expiry time of the given url in the cache. More...
 
Time GetValid (const std::string &url)
 Get expiry time of a cached file. More...
 
bool SetValid (const std::string &url, const Time &val)
 Set expiry time of a cache file. More...
 
 operator bool ()
 Returns true if object is useable. More...
 
bool operator== (const FileCache &a)
 Returns true if all attributes are equal. More...
 

Detailed Description

FileCache provides an interface to all cache operations.

When it is decided a file should be downloaded to the cache, Start() should be called, so that the cache file can be prepared and locked if necessary. If the file is already available it is not locked and Link() can be called immediately to create a hard link to a per-job directory in the cache and then soft link, or copy the file directly to the session directory so it can be accessed from the user's job. If the file is not available, Start() will lock it, then after downloading Link() can be called. Stop() must then be called to release the lock. If the transfer failed, StopAndDelete() can be called to clean up the cache file. After a job has finished, Release() should be called to remove the hard links created for that job.

Cache files are locked for writing using the FileLock class, which creates a lock file with the '.lock' suffix next to the cache file. If Start() is called and the cache file is not already available, it creates this lock and Stop() must be called to release it. All processes calling Start() must wait until they successfully obtain the lock before downloading can begin.

The cache directory(ies) and the optional directory to link to when the soft-links are made are set in the constructor. The names of cache files are formed from an SHA-1 hash of the URL to cache. To ease the load on the file system, the cache files are split into subdirectories based on the first two characters in the hash. For example the file with hash 76f11edda169848038efbd9fa3df5693 is stored in 76/f11edda169848038efbd9fa3df5693. A cache filename can be found by passing the URL to Find(). For more information on the structure of the cache, see the ARC Computing Element System Administrator Guide (NORDUGRID-MANUAL-20).

Constructor & Destructor Documentation

Arc::FileCache::FileCache ( const std::string &  cache_path,
const std::string &  id,
uid_t  job_uid,
gid_t  job_gid 
)

Create a new FileCache instance with one cache directory.

Parameters
cache_pathThe format is "cache_dir[ link_path]". path is the path to the cache directory and the optional link_path is used to create a link in case the cache directory is visible under a different name during actual usage. When linking from the session dir this path is used instead of cache_path.
idthe job id. This is used to create the per-job dir which the job's cache files will be hard linked from
job_uidowner of job. The per-job dir will only be readable by this user
job_gidowner group of job
Arc::FileCache::FileCache ( const std::vector< std::string > &  caches,
const std::string &  id,
uid_t  job_uid,
gid_t  job_gid 
)

Create a new FileCache instance with multiple cache dirs.

Parameters
cachesa vector of strings describing caches. The format of each string is "cache_dir[ link_path]".
idthe job id. This is used to create the per-job dir which the job's cache files will be hard linked from
job_uidowner of job. The per-job dir will only be readable by this user
job_gidowner group of job
Arc::FileCache::FileCache ( const std::vector< std::string > &  caches,
const std::vector< std::string > &  remote_caches,
const std::vector< std::string > &  draining_caches,
const std::string &  id,
uid_t  job_uid,
gid_t  job_gid 
)

Create a new FileCache instance with multiple cache dirs, remote caches and draining cache directories.

Parameters
cachesa vector of strings describing caches. The format of each string is "cache_dir[ link_path]".
remote_cachesSame format as caches. These are the paths to caches which are under the control of other Grid Managers and are read-only for this process.
draining_cachesSame format as caches. These are the paths to caches which are to be drained.
idthe job id. This is used to create the per-job dir which the job's cache files will be hard linked from
job_uidowner of job. The per-job dir will only be readable by this user
job_gidowner group of job
Arc::FileCache::FileCache ( )
inline

Default constructor. Invalid cache.

Member Function Documentation

bool Arc::FileCache::AddDN ( const std::string &  url,
const std::string &  DN,
const Time expiry_time 
)

Store a DN in the permissions cache for the given url.

Add the given DN to the list of cached DNs with the given expiry time.

Parameters
urlthe url corresponding to the cache file to which we want to add a cached DN
DNthe DN of the user
expiry_timethe expiry time of this DN in the DN cache
Returns
true if the DN was successfully added
bool Arc::FileCache::CheckCreated ( const std::string &  url)

Check if it is possible to obtain the creation time of a cache file.

Parameters
urlthe url corresponding to the cache file for which we want to know if the creation date exists
Returns
true if the file exists in the cache, since the creation time is the creation time of the cache file.
bool Arc::FileCache::CheckDN ( const std::string &  url,
const std::string &  DN 
)

Check if a DN exists in the permission cache and is still valid for the given url.

Check if the given DN is cached for authorisation and it is still valid.

Parameters
urlthe url corresponding to the cache file for which we want to check the cached DN
DNthe DN of the user
Returns
true if the DN exists and is still valid
bool Arc::FileCache::CheckValid ( const std::string &  url)

Check if there is an expiry time of the given url in the cache.

Parameters
urlthe url corresponding to the cache file for which we want to know if the expiration time exists
Returns
true if an expiry time exists
std::string Arc::FileCache::File ( const std::string &  url)

Get the cache filename for the given URL.

Parameters
urlthe URL to look for in the cache
Returns
the full pathname of the file in the cache which corresponds to the given url.
Time Arc::FileCache::GetCreated ( const std::string &  url)

Get the creation time of a cached file.

Parameters
urlthe url corresponding to the cache file for which we want to know the creation date
Returns
creation time of the file or 0 if the cache file does not exist
Time Arc::FileCache::GetValid ( const std::string &  url)

Get expiry time of a cached file.

Parameters
urlthe url corresponding to the cache file for which we want to know the expiry time
Returns
the expiry time or 0 if none is available
bool Arc::FileCache::Link ( const std::string &  link_path,
const std::string &  url,
bool  copy,
bool  executable,
bool  holding_lock,
bool &  try_again 
)

Link a cache file to the place it will be used.

Create a hard-link to the per-job dir from the cache dir, and then a soft-link from here to the session directory. This is effectively 'claiming' the file for the job, so even if the original cache file is deleted, eg by some external process, the hard link still exists until it is explicitly released by calling Release().

If cache_link_path is set to "." or copy or executable is true then files will be copied directly to the session directory rather than linked.

After linking or copying, the cache file is checked for the presence of a write lock, and whether the modification time has changed since linking started (in case the file was locked, modified then released during linking). If either of these are true the links created during Link() are deleted, try_again is set to true and Link() returns false. The caller should then go back to Start(). If the caller has obtained a write lock from Start() and then downloaded the file, it should set holding_lock to true, in which case none of the above checks are performed.

The session directory is accessed under the uid and gid passed in the constructor.

Parameters
link_pathpath to the session dir for soft-link or new file
urlurl of file to link to or copy
copyIf true the file is copied rather than soft-linked to the session dir
executableIf true then file is copied and given execute permissions in the session dir
holding_lockShould be set to true if the caller already holds the lock
try_againIf after linking the cache file was found to be locked, deleted or modified, then try_again is set to true
Returns
true if linking succeeded, false if an error occurred or the file was locked or modified by another process during linking
Arc::FileCache::operator bool ( void  )
inline

Returns true if object is useable.

bool Arc::FileCache::operator== ( const FileCache a)

Returns true if all attributes are equal.

bool Arc::FileCache::Release ( ) const

Release cache files used in this cache.

Release claims on input files for the job specified by id. For each cache directory the per-job directory with the hard-links will be deleted.

Returns
false if any directory fails to be deleted
bool Arc::FileCache::SetValid ( const std::string &  url,
const Time val 
)

Set expiry time of a cache file.

Parameters
urlthe url corresponding to the cache file for which we want to set the expiry time
valexpiry time
Returns
true if the expiry time was successfully set
bool Arc::FileCache::Start ( const std::string &  url,
bool &  available,
bool &  is_locked,
bool  use_remote = true,
bool  delete_first = false 
)

Start preparing to cache the file specified by url.

Start() returns true if the file was successfully prepared. The available parameter is set to true if the file already exists and in this case Link() can be called immediately. If available is false the caller should write the file and then call Link() followed by Stop(). Start() returns false if it was unable to prepare the cache file for any reason. In this case the is_locked parameter should be checked and if it is true the file is locked by another process and the caller should try again later.

Parameters
urlurl that is being downloaded
availabletrue on exit if the file is already in cache
is_lockedtrue on exit if the file is already locked, ie cannot be used by this process
use_remoteWhether to look to see if the file exists in a remote cache. Can be set to false if for example a forced download to cache is desired.
delete_firstIf true then any existing cache file is deleted.
Returns
true if file is available or ready to be downloaded, false if the file is already locked or preparing the cache failed.
bool Arc::FileCache::Stop ( const std::string &  url)

Stop the cache after a file was downloaded.

This method (or stopAndDelete()) must be called after file was downloaded or download failed, to release the lock on the cache file. Stop() does not delete the cache file. It returns false if the lock file does not exist, or another pid was found inside the lock file (this means another process took over the lock so this process must go back to Start()), or if it fails to delete the lock file. It must only be called if the caller actually downloaded the file. It must not be called if the file was already available.

Parameters
urlthe url of the file that was downloaded
Returns
true if the lock was successfully released.
bool Arc::FileCache::StopAndDelete ( const std::string &  url)

Stop the cache after a file was downloaded and delete the cache file.

Release the cache file and delete it, because for example a failed download left an incomplete copy. This method also deletes the meta file which contains the url corresponding to the cache file. The logic of the return value is the same as Stop(). It must only be called if the caller downloaded the file.

Parameters
urlthe url corresponding to the cache file that has to be released and deleted
Returns
true if the cache file and lock were successfully removed.

The documentation for this class was generated from the following file: