ARC CE REST interface specification¶
Note
The current interface version is 1.0
Warning
This is a new REST Intreface specification which implementation is under development! An earlier version of an ARC CE REST Interface implementation called the “The REST technical preview” released with ARC 6.x series does not corespond to this specification.
The REST API endpoint¶
The various functionalities of the service are accessible through HTTP(S) URL built upon following pattern:
<service endpoint URL>/rest/<version>/<functionality>
<service endpoint URL>
represents mounting point of the service and may look likehttps://arc.example.org:443/arex
.<version>
is two parts number separated by dot. Current version is1.0
.<functionality>
is one of keywords defined below.
Further the part <service endpoint URL>/rest/<version>
is referred as <base URL>
.
All parts of URL to the right of hostname are case-sensitive.
Depending on Accept
header in HTTP request (Accept: application/json
, Accept: text/xml
or Accept: application/xml
), information in the response rendered in either JSON or XML format. If not specified it defaults to text/html
and output is compatible with ordinary web browser.
In the HTTP response headers the HTTP Status-Code (RFC7231) indicates the status of the overal request (e.g. 403 corresponds to the forbidden).
For the operations that support multiple (bulk) requests per single API call, in addition to the Status-Code in HTTP header, the per-request Status-Codes are returned. They are included as a part of the response array in HTTP body using the same RFC2731 values following the syntax defined below.
Description of functionalities and operations¶
Requesting supported versions¶
GET <service endpoint URL>/rest
Operations:
GET
- returns list of supported REST API versionsPOST
,PUT
,DELETE
- not supported
Example response:
The XML response is like:
<versions> <version>1.0</version> <version>1.1</version> <version>1.2</version> </versions>The JSON is:
{version: [ "1.0", "1.1", "1.2" ]}or
{version: "1.0"}
Obtaining CE resource information¶
GET <base URL>/info[?schema=glue2]
Operations:
GET
- retrieve generic information about cluster properties. It accepts the optionalschema
parameter with the following values:glue2
,crr
. By the default the information is served as a GLUE2 document. The CRR rendering will be added in future ARC releases. XML or JSON returned according to request headers.HEAD
- supportedPUT
,POST
,DELETE
- not supported.
Example QUERY:
GET https://host.domain.org:443/arex/rest/1.0/info?schema=glue2 HTTP/1.1 Accept: application/xmlThe XML response is:
<InfoRoot> <Domains xmlns="http://schemas.ogf.org/glue/2009/03/spec_2.0_r1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://raw.github.com/OGF-GLUE/XSD/master/schema/GLUE2.xsd"> <AdminDomain BaseType="Domain" CreationTime="2018-11-06T20:26:46Z" Validity="10800"> <ID>urn:ad:UNDEFINEDVALUE</ID> <Name>UNDEFINEDVALUE</Name> <Distributed>false</Distributed> <Services> <ComputingService BaseType="Service" CreationTime="2018-11-06T20:26:46Z" Validity="10800"> <ID>urn:ogf:ComputingService:arc.zero:arex</ID> <Capability>data.transfer.cepush.srm</Capability> <Capability>executionmanagement.jobmanager</Capability> ... output omitted ...
Operating jobs¶
GET <base URL>/jobs[?state=<state1>[,<state2>[…]]]
POST <base URL>/jobs?action=new
POST <base URL>/jobs?action={info|status|kill|clean|restart|delegations}
Operations:
GET
- get list of jobsHEAD
- supportedPOST
- job submission and managementPUT
,DELETE
- not supported
Get list of jobs¶
GET <base URL>/jobs
retrieves list of jobs belonging to authenticated user as application/xml
or application/json
. Returned document contains list of job IDs.
It accepts the optional state
parameters. When defined the returned document contains only jobs in the requested state(s).
Example QUERY:
GET https://host.domain.org:443/arex/rest/1.0/jobs HTTP/1.1 Accept: application/xmlThe XML response is:
<jobs> <job> <id>1234567890abcdef</id> </job> <job> <id>fedcba0987654321</id> </job> </jobs>The JSON is:
{ "job":[ {"id":"1234567890abcdef"}, {"id":"fedcba0987654321"} ] }
Job submission (create a new job)¶
POST <base URL>/jobs?action=new
initiates creation of a new job instance or multiple jobs.
Request body contains job description(s), in one of the supported formats: ADL as Content-type: application/xml
or XRSL as Content-type: applicaton/rsl
.
To pass multiple job descriptions in document body of the same type:
- ADL descriptions are enclosed in
<ActivityDescriptions>
element- XRSL uses
+
to merge multiple jobs.
Response contains 201 code. Response body contains an array of elements corresponding to the sequence of the job descriptions in the requests in the same order. The elemenets of the array in the response contains:
status-code
: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231)reason
: a short textual description of the Status-Codeid
: job UUID or None if not assigned (non-successfull submission)state
: the job state according to state model or None if not available (non-successfull submission)The XML response is:
<jobs> <job> <status-code>201</status-code> <reason>Created</reason> <id>1234567890abcdef</id> <state>ACCEPTING</state> </job> <job> <status-code>500</status-code> <reason>Requested RTE is missing</reason> </job> </jobs>The JSON is:
{ "job":[ { "status-code":"201", "reason":"Created", "id":"1234567890abcdef", "state":"ACCEPTING" }, { "status-code":"500", "reason":"Requested RTE is missing", } ] }
Jobs management¶
POST <base URL>/jobs?action={info|status|kill|clean|restart|delegations}
- job management operations supporting arrays of jobs.
Request body contains list of jobids as JSON/XML (e.g. output of GET <base URL>/jobs
can be reused).
Example of the body in XML:
<jobs> <job> <id>1234567890abcdef</id> </job> <job> <id>fedcba0987654321</id> </job> </jobs>
And in JSON:
{ "job":[ {"id":"1234567890abcdef"}, {"id":"fedcba0987654321"} ] }
Response depends on the requested action:
- Job info
POST <base URL>/jobs?action=info
retrieves full information about job(s) according to the GLUE2 activity information XML document, or in JSON format.
Response contains 201 code. Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:
status-code
: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231). The 200 is only positive response.reason
: a short textual description of the Status-Codeid
: job UUIDinfo_document
: GLUE2 activity information about the job or empty documents if not available (request if not satisfiable)
- Job status
POST <base URL>/jobs?action=status
retrieves information about job(s) current state.
Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:
status-code
: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231). The 200 is only positive response.reason
: a short textual description of the Status-Codeid
: job UUIDstate
: the job state according to state model or None if not available
- Killing jobs
POST <base URL>/jobs?action=kill
send a request to kill job(s).
Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:
status-code
: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231). The response code is 202 to indicate request is queued for later execution and is only positive response.reason
: a short textual description of the Status-Codeid
: job UUID
- Clean job files
POST <base URL>/jobs?action=clean
send a request to clean job(s) files.
Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:
status-code
: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231). The response code is 202 to indicate request is queued for later execution and is only positive response.reason
: a short textual description of the Status-Codeid
: job UUID
- Restart job
POST <base URL>/jobs?action=restart
send a request to restart job(s).
Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:
status-code
: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231). The response code is 202 to indicate request is queued for later execution.reason
: a short textual description of the Status-Codeid
: job UUID
- Job delegations
POST <base URL>/jobs?action=delegations
- retrieves list of delegations associated with the job.
Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:
status-code
: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231), 200 is only positive responsereason
: a short textual description of the Status-Codeid
: job UUIDdelegation_id
: an array of assigned delegation IDs
File operations¶
Files belonging to specific job are operated using <base URL>/jobs/<job id>
URL.
Working with session directory¶
GET <base URL>/jobs/<job id>/session/<path>
DELETE <base URL>/jobs/<job id>/session/<path>
PUT <base URL>/jobs/<job id>/session/<path>
Operations:
GET
,HEAD
,PUT
,DELETE
- supported for files stored in job’s session directory and perform usual actions.GET
,HEAD
- for directories retrieves list of stored files (consider WebDAV for format)DELETE
- for directories removes whole directoryPUT
- for directory not supported.POST
- not supported.
Delegation functionality¶
GET <base URL>/delegations
POST <base URL>/delegations?action=new
Operations:
GET
- retrieves list of delegations belonging to authenticated userHEAD
- supportedPOST
- create new delegationPUT
,DELETE
- not supported
POST <base URL>/delegations/<delegation id>?action=get,renew,delete
PUT <base URL>/delegations/<delegation id>
Operations:
GET
,HEAD
- not supportedPOST
- manage particular delegation IDPUT
- store delegation public part for particular delegation ID
Get list of delegations¶
GET <base URL>/delegations
- retrieves list of delegations belonging to authenticated user
QUERY:
GET https://host.domain.org:443/arex/rest/1.0/delegations HTTP/1.1 Accept: application/xmlThe XML response is:
<delegations> <delegation> <id>1234567890abcdef</id> </delegation> <delegation> <id>fedcba0987654321</id> </delegation> </delegations>The JSON formatted response (make consistent across specification):
{ delegation: [ { "id":"1234567890abcdef"}, { "id":"fedcba0987654321"} ] }
New delegation¶
Delegation is a 2-step process. Step 1 generates pair of private/public keys on server side and communicates X.509 certificate request to the client. Client sings public key and stores delegated certificate to finish delegation procedure.
- 1 step
POST <base URL>/delegations?action=new
starts a new delegation process (1st step). Response is 201 and contains certificate request ofapplication/x-pem-file
type and URL of delegation inLocation
HTTP header with assigned delegation id.- 2 step
PUT <base URL>/delegations/<delegation id>
stores public part (2nd step). Request body contains signed certificate (Content-type: application/x-pem-file
). Response is 200 on success.
Delegations management¶
Delegations are managed one-by-one. The same delegation ID can be re-used for multiple jobs (submitted separately or in batch).
The delegation ID to be used in the job context required to be explicitely specified as a part of the job description in a description language defined way (e.g. DelegationID
in ADL).
POST <base URL>/delegations/<delegation id>?action=get,renew,delete
used to manage delegation.
Request body is empty and action is defined by action
value.
Response is structured depending on the action:
- Get delegation
POST <base URL>/delegations/<delegation id>?action=get
returns public part of the stored delegation asapplication/x-pem-file
- Renew delegation
POST <base URL>/delegations/<delegation id>?action=renew
initiates renewal of delegation. Response is 200 with certificate request ofapplication/x-pem-file
type.- Delete delegation
POST <base URL>/delegations/<delegation id>?action=delete
removes delegation. Response is 200 with no body expected.
A-REX control directory files access for debugging purposes¶
GET <base URL>/jobs/<job id>/diagnose/<file type>
Operations:
GET
- return the content of file in A-REX control directory for requested jobIDHEAD
- supportedPOST
,PUT
,DELETE
- not supported
The <file type>
matches the controldir file suffix and can be one of the following:
- failed
- local
- errors
- description
- diag
- comment
- status
- acl
- xml
- input
- output
- input_status
- output_status
- statistics
REST Interface Job States¶
REST API State Name | Description | A-REX Internal State |
---|---|---|
ACCEPTING | This is the initial job state. The job has reached the cluster, a session directory was created, the submission client can optionally upload files to the sessiondir. The job waits to be detected by the A-REX, the job processing on the CE hasn’t started yet | ACCEPTED |
ACCEPTED | In the ACCEPTED state the newly created job has been detected by A-REX but can’t go to the next state due to an internal A-REX limit. The submission client can optionally upload files to the sessiondir. | PENDING:ACCEPTED |
PREPARING | The job is undergoing the data stage-in process, input data is being gathered into the session directory (via external downloads or making cached copies available). During this state the submission client still can upload files to the session directory. This is an I/O heavy job state. | PREPARING |
PREPARED | The job successfully completed the data stage-in process and is being held waiting in A-REX’s internal queue before it can be passed over to the batch system | PENDING:PREPARING |
SUBMITTING | The job environment (via using RTEs) and the job batch submission script is being prepared to be followed by the submission to the batch system via using the available batch submission client interface | SUBMIT |
QUEUING | The job is under the control of the local batch system and is “queuing in the batch system”, waiting for a node/available slot | INLRMS |
RUNNING | The job is under the control of the local batch system and is “running in the batch system”, executing on an allocated node under the control of the batch system | INLRMS |
HELD | The job is under the control of the local batch system and is being put on hold or being suspended, for some reason the job is in a “pending state” of the batch system | INLRMS |
EXITINGLRMS | The job is under the control of the local batch system and is finishing its execution on the worker node, the job is “exiting” from the batch system either because the job is completed or because it was terminated | INLRMS |
OTHER | The job is under the control of the local batch system and is in some “other” native batch system state which can not be mapped to any of the previously described batch systems states. | INLRMS |
EXECUTED | The job has successfully completed in the batch system. The job is waiting to be picked up by the A-REX for further processing or waiting for an available data stage-out slot. | PENDING:INLRMS |
FINISHING | The job is undergoing the data stage-out process, A-REX is moving output data to the specified output file locations, the session directory is being cleaned up. Note that failed or terminated jobs can also undergo the FINISHING state. This is an I/O heavy job state | FINISHING |
FINISHED | Successful completion of the job on the cluster. The job has finished ALL its activity on the cluster AND no errors occurred during the job’s lifetime. | FINISHED |
FAILED | Unsuccessful completion of the job. The job failed during one of the processing stages. The job has finished ALL its activity on the cluster and there occurred some problems during the lifetime of the job. | FINISHED |
KILLING | The job was requested to be terminated by an authorized user and as a result it is being killed. A-REX is terminating any active process related to the job, e.g. it interacts with the LRMS by running the job-cancel script or stops data staging processes. Once the job has finished ALL its activity on the cluster it will be moved to the KILLED state. | CANCELLING |
KILLED | The job was terminated as a result of an authorized user request. The job has finished ALL its activity on the cluster. | FINISHED |
WIPED | The generated result of jobs are kept available in the session directory on the cluster for a while after the job reaches its final state (FINISHED, FAILED or KILLED). Later, the job’s session directory and most of the job related data are going to be deleted from the cluster when an expiration time is exceeded. Jobs with expired session directory lifetime are “deleted” from the cluster in the sense that only a minimal set of info is kept about such a job and their state is changed to WIPED | DELETED |
Status of This Document¶
This document provides normative specificsation for the ARC REST Interface version 1.0.
Note that during the testing of the implementation the exact rendering of the responses (especially JSON) might be adjusted.
In order to complement the finalization of this specification, the following actions need to be completed:
- provide more detailed request/response examples
- DelegationID should be added to xRSL. Per-url delegation ID can be specified as on URL-option. Job-wide delegation id requires new
delegationId
option
This specification was designed by the requirements listed below:
- Support for versioning: via URL paths like https://arc.zero:443/arex/rest/1.0/jobs
- Usable with simple tools (wget, curl)
- Friendly to common HTTP REST frameworks
- Interactive access to session directory content
- Machine readable error/result codes/messages
- No drastic changes to information representation and jobs handling
- Support for different response formats: xml, json
Plans for functionality extension post version 1.0:
- More effective bulk operations: with HTTP v2, will require HTTP v2 development for HED, this feature is postponed till next versions
- Resource information functionality: consider filtering through URL options, consider supporting references (relative URLs) to underlying resources.
- Scalability for many jobs and delegations: consider filtering through URL options
- Jobs: consider a way to provide list of all jobs per site or per VO to special monitoring agents
- Add
hold
action for jobs management once it will be implemented- For sessiondir access add
PATCH
for files to modifies part of files. Body format need to be defined, all files treated as binary, currently support only non-standardPUT
with ranges.