ARC CE REST interface specification

Note

The current interface version is 1.0 (draft)

Warning

This is a new REST Intreface specification which implementation is under development! An earlier version of an ARC CE REST Interface implementation called the “The REST technical preview” released with ARC 6.x series does not corespond to this specification.

The REST API endpoint

The various functionalities of the service are accessible through HTTP(S) URL built upon following pattern:

<service endpoint URL>/rest/<version>/<functionality>

  • <service endpoint URL> represents mounting point of the service and may look like https://arc.example.org:443/arex.
  • <version> is two parts number separated by dot. Current version is 1.0.
  • <functionality> is one of keywords defined below.

Further the part <service endpoint URL>/rest/<version> is referred as <base URL>.

All parts of URL to the right of hostname are case-sensitive.

Depending on Accept header in HTTP request (Accept: application/json or Accept: application/xml), information in the response rendered in either JSON or XML format.

In the HTTP response headers the HTTP Status-Code (RFC7231) indicate the status of the overal request (e.g. 403 corresponds to the forbidden).

For the operations that support multiple (bulk) requests per single API call, in addition to the Status-Code in HTTP header, the per-request Status-Codes are returned. They are included as a part of the response array in HTTP body using the same RFC2731 values following the syntax defined below.

Description of functionalities and operations

Requesting supported versions

GET <service endpoint URL>/rest

Operations:

  • GET - returns list of supported REST API versions
  • POST, PUT, DELETE - not supported

Example response:

The XML response is like:

<versions>
  <version>1.0</version>
  <version>1.1</version>
  <version>1.2</version>
</versions>

The JSON is:

[ "1.0", "1.1", "1.2" ]

Obtaining CE resource information

GET <base URL>/info[?schema=glue2]

Operations:

  • GET - retrieve generic information about cluster properties. It accepts the optional schema parameter with the following values: glue2, crr. By the default the information is served as a GLUE2 document. The CRR rendering will be added in future ARC releases. XML or JSON returned according to request headers.
  • HEAD - supported
  • PUT, POST, DELETE - not supported.

Example QUERY:

GET https://host.domain.org:443/arex/rest/1.0/info?schema=glue2 HTTP/1.1
Accept: application/xml

The XML response is:

<InfoRoot>
  <Domains xmlns="http://schemas.ogf.org/glue/2009/03/spec_2.0_r1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="https://raw.github.com/OGF-GLUE/XSD/master/schema/GLUE2.xsd">
    <AdminDomain BaseType="Domain" CreationTime="2018-11-06T20:26:46Z" Validity="10800">
      <ID>urn:ad:UNDEFINEDVALUE</ID>
      <Name>UNDEFINEDVALUE</Name>
      <Distributed>false</Distributed>
      <Services>
        <ComputingService BaseType="Service" CreationTime="2018-11-06T20:26:46Z" Validity="10800">
          <ID>urn:ogf:ComputingService:arc.zero:arex</ID>
          <Capability>data.transfer.cepush.srm</Capability>
          <Capability>executionmanagement.jobmanager</Capability>
 ... output omitted ...

Operating jobs

GET <base URL>/jobs[?state=<state1>[&state=<state2>[…]]]

POST <base URL>/jobs?action=new

POST <base URL>/jobs?action={info|status|kill|clean|restart}

Operations:

  • GET - get list of jobs
  • HEAD - supported
  • POST - job submission and management
  • PUT, DELETE - not supported

Get list of jobs

GET <base URL>/jobs retrieves list of jobs belonging to authenticated user as application/xml or application/json. Returned document contains list of job IDs.

It accepts the optional state parameters. When defined the returned document contains only jobs in the requested state(s).

Example QUERY:

GET https://host.domain.org:443/arex/rest/1.0/jobs HTTP/1.1
Accept: application/xml

The XML response is:

<jobs>
  <job>
    <id>1234567890abcdef</id>
  </job>
  <job>
    <id>fedcba0987654321</id>
  </job>
</jobs>

The JSON is:

[
  "1234567890abcdef",
  "fedcba0987654321"
]

Job submission (create a new job)

POST <base URL>/jobs?action=new initiates creation of a new job instance or multiple jobs.

Request body contains job description(s), in one of the supported formats: ADL as Content-type: application/xml or XRSL as Content-type: applicaton/rsl.

To pass multiple job descriptions in document body of the same type:

  • ADL descriptions are enclosed in <ActivityDescriptions> element
  • XRSL uses + to merge multiple jobs.

Response body contains an array of elements corresponding to the sequence of the job descriptions in the requests in the same order. The elemenets of the array in the response contains:

  • status-code: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231)
  • reason: a short textual description of the Status-Code
  • id: job UUID or None if not assigned (non-successfull submission)
  • state: the job state according to state model or None if not available (non-successfull submission)

Jobs management

POST <base URL>/jobs?action={info|status|kill|clean|restart} - job management operations supporting arrays of jobs.

Request body contains list of jobids as JSON/XML (e.g. output of GET <base URL>/jobs can be reused).

Response depends on the requested action:

Job info
POST <base URL>/jobs?action=info retrieves full information about job(s) according to the GLUE2 activity information XML document, or in JSON format.

Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:

  • status-code: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231)
  • reason: a short textual description of the Status-Code
  • id: job UUID
  • info_document: GLUE2 activity information about the job or empty documents if not available (request if not satisfiable)
Job status
POST <base URL>/jobs?action=status retrieves information about job(s) current state.

Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:

  • status-code: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231)
  • reason: a short textual description of the Status-Code
  • id: job UUID
  • state: the job state according to state model or None if not available
Killing jobs
POST <base URL>/jobs?action=kill send a request to kill job(s).

Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:

  • status-code: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231). The response code is 202 to indicate request is queued for later execution.
  • reason: a short textual description of the Status-Code
  • id: job UUID
Clean job files
POST <base URL>/jobs?action=clean send a request to clean job(s) files.

Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:

  • status-code: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231). The response code is 202 to indicate request is queued for later execution.
  • reason: a short textual description of the Status-Code
  • id: job UUID
Restart job
POST <base URL>/jobs?action=restart send a request to restart job(s).

Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:

  • status-code: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231). The response code is 202 to indicate request is queued for later execution.
  • reason: a short textual description of the Status-Code
  • id: job UUID
Job delegations
POST <base URL>/jobs?action=delegations - retrieves list of delegations associated with the job.

Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:

  • status-code: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231)
  • reason: a short textual description of the Status-Code
  • id: job UUID
  • delegation_id: an array of assigned delegation IDs

File operations

Files belonging to specific job are operated using <base URL>/jobs/<job id> URL.

Working with session directory

GET <base URL>/jobs/<job id>/session/<path>

DELETE <base URL>/jobs/<job id>/session/<path>

PUT <base URL>/jobs/<job id>/session/<path>

Operations:

  • GET, HEAD, PUT, DELETE - supported for files stored in job’s session directory and perform usual actions.
  • GET, HEAD - for directories retrieves list of stored files (consider WebDAV for format)
  • DELETE - for directories removes whole directory
  • PUT - for directory not supported.
  • POST - not supported.

Delegation functionality

GET <base URL>/delegations

POST <base URL>/delegations?action=new

Operations:

  • GET - retrieves list of delegations belonging to authenticated user
  • HEAD - supported
  • POST - create new delegation
  • PUT, DELETE - not supported

POST <base URL>/delegations/<delegation id>?action=get,renew,delete

PUT <base URL>/delegations/<delegation id>

Operations:

  • GET, HEAD - not supported
  • POST - manage particular delegation ID
  • PUT - store delegation public part for particular delegation ID

Get list of delegations

GET <base URL>/delegations - retrieves list of delegations belonging to authenticated user

QUERY:

GET https://host.domain.org:443/arex/rest/1.0/delegations HTTP/1.1
Accept: application/xml

The XML response is:

<delegations>
  <delegation>
    <id>1234567890abcdef</id>
  </delegation>
  <delegation>
    <id>fedcba0987654321</id>
  </delegation>
</delegations>

The JSON formatted response (make consistent across specification):

[
  "1234567890abcdef",
  "fedcba0987654321"
]

New delegation

Delegation is a 2-step process. Step 1 generates pair of private/public keys on server side and communicates X.509 certificate request to the client. Client sings public key and stores delegated certificate to finish delegation procedure.

1 step
POST <base URL>/delegations?action=new starts a new delegation process (1st step). Response is 201 and contains certificate request of application/x-pem-file type and URL of delegation in Location HTTP header with assigned delegation id.
2 step
PUT <base URL>/delegations/<delegation id> stores public part (2nd step). Request body contains signed certificate (Content-type: application/x-pem-file). Response is 200 on success.

Delegations management

Delegations are managed one-by-one. The same delegation ID can be re-used for multiple jobs (submitted separately or in batch).

The delegation ID to be used in the job context required to be explicitely specified as a part of the job description in a description language defined way (e.g. DelegationID in ADL).

POST <base URL>/delegations/<delegation id>?action=get,renew,delete used to manage delegation.

Request body is empty and action is defined by action value.

Response is structured depending on the action:

Get delegation
POST <base URL>/delegations/<delegation id>?action=get returns public part of the stored delegation as application/x-pem-file
Renew delegation
POST <base URL>/delegations/<delegation id>?action=renew initiates renewal of delegation. Response is 200 with certificate request of application/x-pem-file type.
Delete delegation
POST <base URL>/delegations/<delegation id>?action=delete removes delegation. Response is 200 with no body expected.

A-REX control directory files access for debugging purposes

GET <base URL>/jobs/<job id>/diagnose/<file type>

Operations:

  • GET - return the content of file in A-REX control directory for requested jobID
  • HEAD - supported
  • POST, PUT, DELETE - not supported

The <file type> matches the controldir file suffix and can be one of the following:

  • failed
  • local
  • errors
  • description
  • diag
  • comment
  • status
  • acl
  • xml
  • input
  • output
  • input_status
  • output_status
  • statistics

REST Interface Job States

Table 6 State identifiers used with ARC REST API
REST API State Name Description A-REX Internal State
ACCEPTING This is the initial job state. The job has reached the cluster, a session directory was created, the submission client can optionally upload files to the sessiondir. The job waits to be detected by the A-REX, the job processing on the CE hasn’t started yet ACCEPTED
ACCEPTED In the ACCEPTED state the newly created job has been detected by A-REX but can’t go to the next state due to an internal A-REX limit. The submission client can optionally upload files to the sessiondir. PENDING:ACCEPTED
PREPARING The job is undergoing the data stage-in process, input data is being gathered into the session directory (via external downloads or making cached copies available). During this state the submission client still can upload files to the session directory. This is an I/O heavy job state. PREPARING
PREPARED The job successfully completed the data stage-in process and is being held waiting in A-REX’s internal queue before it can be passed over to the batch system PENDING:PREPARING
SUBMITTING The job environment (via using RTEs) and the job batch submission script is being prepared to be followed by the submission to the batch system via using the available batch submission client interface SUBMIT
QUEUING The job is under the control of the local batch system and is “queuing in the batch system”, waiting for a node/available slot INLRMS
RUNNING The job is under the control of the local batch system and is “running in the batch system”, executing on an allocated node under the control of the batch system INLRMS
HELD The job is under the control of the local batch system and is being put on hold or being suspended, for some reason the job is in a “pending state” of the batch system INLRMS
EXITINGLRMS The job is under the control of the local batch system and is finishing its execution on the worker node, the job is “exiting” from the batch system either because the job is completed or because it was terminated INLRMS
OTHER The job is under the control of the local batch system and is in some “other” native batch system state which can not be mapped to any of the previously described batch systems states. INLRMS
EXECUTED The job has successfully completed in the batch system. The job is waiting to be picked up by the A-REX for further processing or waiting for an available data stage-out slot. PENDING:INLRMS
FINISHING The job is undergoing the data stage-out process, A-REX is moving output data to the specified output file locations, the session directory is being cleaned up. Note that failed or terminated jobs can also undergo the FINISHING state. This is an I/O heavy job state FINISHING
FINISHED Successful completion of the job on the cluster. The job has finished ALL its activity on the cluster AND no errors occurred during the job’s lifetime. FINISHED
FAILED Unsuccessful completion of the job. The job failed during one of the processing stages. The job has finished ALL its activity on the cluster and there occurred some problems during the lifetime of the job. FINISHED
KILLING The job was requested to be terminated by an authorized user and as a result it is being killed. A-REX is terminating any active process related to the job, e.g. it interacts with the LRMS by running the job-cancel script or stops data staging processes. Once the job has finished ALL its activity on the cluster it will be moved to the KILLED state. CANCELLING
KILLED The job was terminated as a result of an authorized user request. The job has finished ALL its activity on the cluster. FINISHED
WIPED The generated result of jobs are kept available in the session directory on the cluster for a while after the job reaches its final state (FINISHED, FAILED or KILLED). Later, the job’s session directory and most of the job related data are going to be deleted from the cluster when an expiration time is exceeded. Jobs with expired session directory lifetime are “deleted” from the cluster in the sense that only a minimal set of info is kept about such a job and their state is changed to WIPED DELETED

Status of This Document

This document provides normative specificsation for the ARC REST Interface version 1.0.

Note that during the implementation the exact rendering of the responses (especially JSON) can be adjusted.

In order to finalize this specification, the following actions need to be completed:

  • provide request/response examples
  • DelegationID should be added to xRSL. Per-url delegation ID can be specified as on URL-option. Job-wide delegation id requires new delegationId option

This specification was designed by the requirements listed below:

  1. Support for versioning: via URL paths like https://arc.zero:443/arex/rest/1.0/jobs
  2. Usable with simple tools (wget, curl)
  3. Friendly to common HTTP REST frameworks
  4. Interactive access to session directory content
  5. Machine readable error/result codes/messages
  6. No drastic changes to information representation and jobs handling
  7. Support for different response formats: xml, json

Plans for functionality extension post version 1.0:

  1. More effective bulk operations: with HTTP v2, will require HTTP v2 development for HED, this feature is postponed till next versions
  2. Resource information functionality: consider filtering through URL options, consider supporting references (relative URLs) to underlying resources.
  3. Scalability for many jobs and delegations: consider filtering through URL options
  4. Jobs: consider a way to provide list of all jobs per site or per VO to special monitoring agents
  5. Add hold action for jobs management once it will be implemented
  6. For sessiondir access add PATCH for files to modifies part of files. Body format need to be defined, all files treated as binary, currently support only non-standard PUT with ranges.