ARC SDK
Data Structures | Typedefs | Enumerations
ARC data staging (libarcdatastaging)

Detailed Description

ARC data staging components form a complete data transfer management system. Whereas ARC data library (libarcdata) is a library for data access, enabling several types of operation on data files on the Grid using a variety of access protocols, ARC data staging (libarcdatastaging) is a framework for managed data transfer to and from the Grid. The data staging system is designed to run as a persistent process, to execute data transfers on demand. Data transfers are defined and fed into the system, and then notification is given when they complete. No knowledge is required of the internal workings of the Grid, a user only needs to specify URLs representing the source and destination of the transfer.

The system is highly configurable and features an intelligent priority, fair-share and error handling mechanism, as well as the ability to spread data transfer across multiple hosts using ARC's DataDelivery service. It is used by ARC's Computing Element (A-REX) for pre- and post- job data transfer of input and output files. Note that this system is primarily for data transfer to and from local files and that third-party transfer is not supported. It is designed for the case of pulling or pushing data between the Grid and a local file system, rather than a service for transfer between two Grid storage elements. It is possible to transfer data between two remote endpoints, but all data flows through the client.

The following code snippet shows a very simple example of how to use libarcdatastaging. The Generator class receives as input a source and destination, and creates a DTR which describes the data transfer. It is then passed to the Scheduler and the Generator defines a receiveDTR() method for the Scheduler to calls to notify that the transfer has finished. A main() program is also shown as an example of how to use the Generator as a basic copy tool from the command line.

Generator.h:

#ifndef GENERATOR_H_
#define GENERATOR_H_
#include <arc/Thread.h>
#include <arc/Logger.h>
#include <arc/data-staging/Scheduler.h>
// This Generator basic implementation shows how a Generator can
// be written. It has one method, run(), which creates a single DTR
// and submits it to the Scheduler.
private:
// Condition to wait on until DTR has finished
static Arc::SimpleCondition cond;
// DTR Scheduler
// Logger object
static Arc::Logger logger;
// Root LogDestinations to be used in receiveDTR
std::list<Arc::LogDestination*> root_destinations;
public:
// Counter for main to know how many DTRs are in the system
// Create a new Generator. start() must be called to start DTR threads.
// Stop Generator and DTR threads
// Implementation of callback from DTRCallback - the callback method used
// when DTR processing is complete to pass the DTR back to the generator.
// It decrements counter.
virtual void receiveDTR(DataStaging::DTR_ptr dtr);
// Start Generator and DTR threads
void start();
// Submit a DTR with given source and destination. Increments counter.
void run(const std::string& source, const std::string& destination);
};
#endif /* GENERATOR_H_ */

Generator.cpp:

#include <arc/GUID.h>
#include "Generator.h"
Arc::Logger Generator::logger(Arc::Logger::getRootLogger(), "Generator");
Arc::SimpleCondition Generator::cond;
Generator::Generator() {
// Set up logging
}
Generator::~Generator() {
logger.msg(Arc::INFO, "Shutting down scheduler");
scheduler.stop();
logger.msg(Arc::INFO, "Scheduler stopped, exiting");
}
// root logger is disabled in Scheduler thread so need to add it here
logger.msg(Arc::INFO, "Received DTR %s back from scheduler in state %s", dtr->get_id(), dtr->get_status().str());
// DTR logger destinations can be destroyed when DTR has finished
counter.dec();
}
void Generator::start() {
// Starting scheduler with default configuration
logger.msg(Arc::INFO, "Generator started");
logger.msg(Arc::INFO, "Starting DTR threads");
scheduler.SetDumpLocation("/tmp/dtr.log");
scheduler.start();
}
void Generator::run(const std::string& source, const std::string& destination) {
std::string job_id = Arc::UUID();
Arc::UserConfig cfg(cred_type);
// check credentials
logger.msg(Arc::ERROR, "No valid credentials found, exiting");
return;
}
Arc::LogDestination * dest = new Arc::LogStream(std::cerr);
log->addDestination(*dest);
DataStaging::DTR_ptr dtr(new DataStaging::DTR(source, destination, cfg, job_id, Arc::User().get_uid(), log));
if (!(*dtr)) {
logger.msg(Arc::ERROR, "Problem creating dtr (source %s, destination %s)", source, destination);
return;
}
// register callback with DTR
dtr->set_tries_left(5);
counter.inc();
}

generator-main.cpp:

#ifdef HAVE_CONFIG_H
#include <config.h>
#endif
#include <signal.h>
#include <arc/StringConv.h>
#include "Generator.h"
static Arc::SimpleCounter counter;
static bool run = true;
static void do_shutdown(int) {
run = false;
}
static void usage() {
std::cout << "Usage: generator [num mock transfers]" << std::endl;
std::cout << " generator source destination" << std::endl;
std::cout << "To use mock transfers ARC must be built with configure --enable-mock-dmc" << std::endl;
std::cout << "The default number of mock transfers is 10" << std::endl;
}
int main(int argc, char** argv) {
signal(SIGTTOU,SIG_IGN);
signal(SIGTTIN,SIG_IGN);
signal(SIGINT, do_shutdown);
// Log to stderr
Arc::LogStream logcerr(std::cerr);
Generator generator;
int num = 10;
if (argc == 1 || argc == 2) { // run mock a number of times
if (argc == 2 && (std::string(argv[1]) == "-h" || !Arc::stringto(argv[1], num))) {
usage();
return 1;
}
generator.start();
for (int i = 0; i < num; ++i) {
std::string source = "mock://mocksrc/mock." + Arc::tostring(i);
std::string destination = "mock://mockdest/mock." + Arc::tostring(i);
generator.run(source, destination);
}
}
else if (argc == 3) { // run with given source and destination
generator.start();
generator.run(argv[1], argv[2]);
}
else {
usage();
return 1;
}
while (generator.counter.get() > 0 && run) {
sleep(1);
}
return 0;
}

For more information see http://wiki.nordugrid.org/index.php/Data_Staging

For more examples on using libarcdatastaging in several languages, see http://wiki.nordugrid.org/index.php/Data_Staging/API

Data Structures

class  DataStaging::DataDelivery
 DataDelivery transfers data between specified physical locations. More...
 
class  DataStaging::DataDeliveryComm
 This class provides an abstract interface for the Delivery layer. More...
 
struct  DataStaging::DataDeliveryComm::Status
 Plain C struct to pass information from executing process back to main thread. More...
 
class  DataStaging::DataDeliveryCommHandler
 Singleton class handling all active DataDeliveryComm objects. More...
 
class  DataStaging::DataDeliveryLocalComm
 This class starts, monitors and controls a local Delivery process. More...
 
class  DataStaging::DataDeliveryRemoteComm
 This class contacts a remote service to make a Delivery request. More...
 
class  DataStaging::TransferParameters
 Represents limits and properties of a DTR transfer. These generally apply to all DTRs. More...
 
class  DataStaging::DTRCacheParameters
 The configured cache directories. More...
 
class  DataStaging::DTRCallback
 The base class from which all callback-enabled classes should be derived. More...
 
class  DataStaging::DTR
 Data Transfer Request. More...
 
class  DataStaging::DTRList
 Global list of all active DTRs in the system. More...
 
class  DataStaging::DTRStatus
 Class representing the status of a DTR. More...
 
class  DataStaging::DTRErrorStatus
 A class to represent error states reported by various components. More...
 
class  DataStaging::Processor
 The Processor performs pre- and post-transfer operations. More...
 
class  DataStaging::Scheduler
 The Scheduler is the control centre of the data staging framework. More...
 
class  DataStaging::TransferSharesConf
 TransferSharesConf describes the configuration of TransferShares. More...
 
class  DataStaging::TransferShares
 TransferShares is used to implement fair-sharing and priorities. More...
 

Typedefs

typedef Arc::ThreadedPointer< DTR > DataStaging::DTR_ptr
 Provides automatic memory management of DTRs and thread-safe destruction. More...
 
typedef Arc::ThreadedPointer
< Arc::Logger
DataStaging::DTRLogger
 The DTR's Logger object can be used outside the DTR object with DTRLogger. More...
 

Enumerations

enum  DataStaging::StagingProcesses {
  DataStaging::GENERATOR, DataStaging::SCHEDULER, DataStaging::PRE_PROCESSOR, DataStaging::DELIVERY,
  DataStaging::POST_PROCESSOR
}
 Components of the data staging framework. More...
 
enum  DataStaging::ProcessState { DataStaging::INITIATED, DataStaging::RUNNING, DataStaging::TO_STOP, DataStaging::STOPPED }
 Internal state of StagingProcesses. More...
 
enum  DataStaging::CacheState {
  DataStaging::CACHEABLE, DataStaging::NON_CACHEABLE, DataStaging::CACHE_ALREADY_PRESENT, DataStaging::CACHE_DOWNLOADED,
  DataStaging::CACHE_LOCKED, DataStaging::CACHE_SKIP, DataStaging::CACHE_NOT_USED
}
 Represents possible cache states of this DTR. More...
 

Typedef Documentation

Provides automatic memory management of DTRs and thread-safe destruction.

The DTR's Logger object can be used outside the DTR object with DTRLogger.

Enumeration Type Documentation

Represents possible cache states of this DTR.

Enumerator
CACHEABLE 

Source should be cached.

NON_CACHEABLE 

Source should not be cached.

CACHE_ALREADY_PRESENT 

Source is available in cache from before.

CACHE_DOWNLOADED 

Source has just been downloaded and put in cache.

CACHE_LOCKED 

Cache file is locked.

CACHE_SKIP 

Source is cacheable but due to some problem should not be cached.

CACHE_NOT_USED 

Cache was started but was not used.

Internal state of StagingProcesses.

Enumerator
INITIATED 

Process is ready to start.

RUNNING 

Process is running.

TO_STOP 

Process has been instructed to stop.

STOPPED 

Proecess has stopped.

Components of the data staging framework.

Enumerator
GENERATOR 

Creator of new DTRs and receiver of completed DTRs.

SCHEDULER 

Controls queues and moves DTRs bewteen other components when necessary.

PRE_PROCESSOR 

Performs all pre-transfer operations.

DELIVERY 

Performs physical transfer.

POST_PROCESSOR 

Performs all post-transfer operations.