Detailed description of DTRs

DTR stands for Data Transfer Request. This is the structure that contains several fields that fully describe the file transfer to be performed. One DTR is generated by the generator per each file transfer.

Fields of the DTR

More or less required:

  • DTR ID

  • source endpoint

  • destination endpoint

    • for source and destination, a list of metadata such as file size, checksum, creation date etc

    • for source and destination (if applicable) a list of replicas

    • for source and destination (if applicable) current replica

    • for source and destination (if applicable) TURL or delivery-level URL used for transfer

    • for source and destination (if applicable) request ID (in the case of asynchronous requests to remote storage services)

  • credentials

  • cache information

    • if the file is cacheable, the filename in cache

    • cache directories configuration

    • caching state (already in cache, cache currently locked etc)

  • local user information (uid/gid)

  • Job ID this transfer belongs to

  • priority of the transfer - a number set by the generator which flattens priorities

  • transfer share this DTR belongs to

  • sub-share the DTR belongs to - may be set by the Generator

  • tries left

  • flags to handle properties and strategies when dealing with index servers

    • flag to say whether DTR is replicating inside the same logical filename

    • flag to say whether DTR should force registration to an existing logical filename, if the source is different

  • mapping info - mapping information of local files to which remote files may be mapped to in the configuration (copyurl/linkurl)

  • status of the DTR

  • error status

    • type of error

    • location of error

    • text description of error detail

  • number of bytes transferred/offset

  • timing properties

    • timeout - time which DTR is allowed to remain in current state

    • creation time

    • last modification time

    • process time - wait until this time to do further processing

  • cancel (set to true if request is to be cancelled)

  • bulk operation flags to combine several DTRs in a bulk request

  • delivery endpoint, whether Delivery is to be carried out by a local process or remote service

  • current owner - who is in charge for this DTR right now

  • logger object, so each DTR can have its own log

  • lock, since DTRs can be modified by several processes, for avoiding writing collisions

Possible

  • affiliation (if we use the affiliation of multiple DTRs, see right below).

  • history of states

Multiple DTRs may be affiliated together. Possible reasons and uses:

  • Belong to same job

  • Belong to bunch of jobs which user indicated as preferably processed together

  • Belong to same VO and assigned priorities to be applied within group

  • Failure of one DTR in group may cancel processing of other DTRs (should be implemented in Generator)

State transitions of DTR

All possible states of a DTR, with arrows indicating the normal flow of DTRs between states. Each state is explained in detail below. Error conditions are not included here but are shown in another diagram further down.

DTR_state_diagram.png

Fig. 16 DTR State Diagram

Status codes

The following table describes all non-error status codes, and also the action taken in the event of a cancellation request being received while in that state. In general if all of the data transfer has been completed before receiving a cancellation request, the destination file is not deleted. The main reason for this is to preserve cache files, as the user may wish to run the same job soon after cancelling it.

Table 3 Statuses of the DTR

Status Code

Text Description

Action on cancel

Statuses set by the generator

NEW

The DTR has just been built by the generator

Return to generator

CANCEL

A request has been made to cancel the DTR

n/a

Statuses set by the scheduler

CHECK_CACHE

The DTR destination is cacheable and the cache should be checked for the file’s existence

Return to generator

RESOLVE

The DTR source is a meta-protocol and should be resolved

Set to PROCESS_CACHE to remove any cache locks

QUERY_REPLICA

The DTR source should be queried to check existence, check file size, checksum etc.

Set to REGISTER_REPLICA to remove pre-registered destination

PRE_CLEAN

The destination in the DTR should be deleted before writing

Set to REGISTER_REPLICA to remove pre-registered destination

STAGE_PREPARE_SOURCE

The DTR source is a meta-protocol which must be prepared or staged

Set to REGISTER_REPLICA to remove pre-registered destination

STAGE_PREPARE_DESTINA TION

The DTR destination is a meta-protocol which must be prepared or staged

Set to REGISTER_REPLICA to remove pre-registered destination

TRANSFER_WAIT

The DTR is ready to be sent to delivery but must wait due to transfer limits or priority settings

Set to RELEASE_REQUEST

TRANSFER

The DTR should be transferred immediately

Set to RELEASE_REQUEST

RELEASE_REQUEST

The DTR transfer has finished and any requests made on remote storage should be released

Abort request and delete destination, set to REGISTER_REPLICA

REGISTER_REPLICA

The DTR destination is a meta-protocol and the new replica should be registered

Delete destination and set to PROCESS_CACHE

PROCESS_CACHE

The DTR destination is cacheable and the cached file should be unlocked and linked/copied to the session dir

Delete cache file

DONE

The DTR completed successfully

Do nothing

CANCELLED

The DTR has been cancelled succesfully

n/a

ERROR

An error occurred with the DTR

Do nothing

Statuses set by the pre-processor

CHECKING_CACHE

The pre-processor is checking the cache

Wait until complete, then set to CACHE_CHECKED. The scheduler will then set to PROCESS_CACHE

CACHE_WAIT

The cache file is locked and the scheduler should wait before trying to obtain the lock

Scheduler will return to generator

CACHE_CHECKED

The cache check is complete

Scheduler will set to PROCESS_CACHE

RESOLVING

The pre-processor is resolving replicas

Wait until complete, then set to RESOLVED. The scheduler will then set to REGISTER_REPLICA

RESOLVED

The replica resolution is complete

Scheduler will set to REGISTER_REPLICA

QUERYING_REPLICA

The pre-processor is querying a replica

Wait until complete, then set to REPLICA_QUERIED. The scheduler will then set to REGISTER_REPLICA

REPLICA_QUERIED

The replica querying is complete

Scheduler will set to REGISTER_REPLICA

PRE_CLEANING

The pre-processor is deleting the destination file

Wait until complete, then set to PRE_CLEANED. The scheduler will set to REGISTER_REPLICA

PRE_CLEANED

The destination file has been deleted

The scheduler will set to REGISTER_REPLICA

STAGING_PREPARING

The pre-processor is making a staging or preparing request

Wait until complete, then scheduler will set to RELEASE_REQUEST so it can be aborted

STAGING_PREPARING_WAI T

The staging or preparing request is not ready and the scheduler should wait before polling the status of the request

Scheduler will set to RELEASE_REQUEST so it can be aborted

STAGED_PREPARED

The staging or preparing request is complete

Scheduler will set to RELEASE_REQUEST so it can be aborted

Statuses set by the delivery

TRANSFERRING

The transfer of the DTR is on-going

Stop transfer and set to RELEASE_REQUEST. Delivery will delete the incomplete file and the request will be aborted

TRANSFERRED

The transfer completed successfully

Scheduler will abort the request

Statuses set by the post-processor

RELEASING_REQUEST

The post-processor is releasing a stage or prepare request

Wait until finished, then set to REGISTER_REPLICA to unregister the file

REQUEST_RELEASED

The release of stage or prepare request is complete

Set to REGISTER_REPLICA to unregister the file

REGISTERING_REPLICA

The post-processor is registering a replica in an index service

Continue as normal

REPLICA_REGISTERED

Replica registration is complete

Continue as normal

PROCESSING_CACHE

The post-processor is releasing locks and copying/linking the cached file to the session dir

Continue as normal

CACHE_PROCESSED

Cache processing is complete

Continue as normal

Error Conditions of DTRs

The following diagram shows possible error conditions and actions taken. For simplicity and because all error handling logic takes place within the scheduler, the pre- and post-processor and the delivery layers are not shown.

DTR_error_state_diagram.png

Fig. 17 DTR Error State Diagram

Errors are categorised into the following types:

Error

Explanation

Retryable?

Action

INTERNAL_LOGIC_ ERROR

Internal error in data staging logic

No

Stop processing and report back to generator

INTERNAL_PROCES S_ERROR

Internal error like losing contact with an external process

Yes

Clean if necessary and retry

SELF_REPLICATIO N_ERROR

Attempt to copy a file to itself

No

Return to generator

CACHE_ERROR

A problem occurred in cache handling

Yes

Retry without caching

TEMPORARY_REMOT E_ERROR

Error such as connection timeout on remote service

Yes

Retry with an increasing back-off

PERMANENT_REMOT E_ERROR

Error such as file not existing, permission denied etc on remote service

No

Follow cancellation steps and return failed DTR to generator

LOCAL_FILE_ERRO R

Error with a local file

No

Follow cancellation steps and return to generator

TRANSFER_SPEED_ ERROR

Transfer rate was below specified limits

Yes

Retry transfer. If all retries fail, report back to generator - it will make the decision on whether to cancel other related DTRs. (Future work: make decision on whether other transfers caused slow transfer and whether cancelling others would help or should be done)

STAGING_TIMEOUT _ERROR

The staging process took too long

No

Try a different replica - if none available, cancel and report back to generator

Methods of DTRs

DTR::push (DTR, receiver) – pass the DTR from one process to another, e.g. DTR::push (dtr, preprocessor)

Implementation

Within Data Staging framework there is a global list of DTRs. Pointers to the DTRs are passed around between components, which can modify them directly and push them between each other.