ARC v12.05 Release Announcement

May 21, 2012


The Advanced Resource Connector (ARC) middleware, introduced by NorduGrid (www.nordugrid.org), is an open source software solution enabling production quality computational and data Grids since May 2002. The previous production ARC release, version 11.05u3, was released on March 22, 2012.

The 12.05 release of the ARC software has a number of substantial changes. Many of them are not backwards compatible, particularly the client library. For a detailed technical overview of the API changes go to the dedicated Wiki page Users and developers who made use of ARC command line or libraries in their tools will have to re-write the code.

Apart of many new features, ARC 12.05 comes with numerous bug fixes and improved documentation.

New features highlights

ARC Compute Element now supports EMI-ES interface for job submission, manipulation and status query. The EMI-ES interface is discoverable only via information endpoints that make use of GLUE2 schema. Note that this breaks compatibility of ARC WS endpoints with older ARC 11.05 clients that relied on incomplete GLUE2 to communicate to pre-production WS endpoints. Upgrade to new ARC 12.05 client is highly recommended.

ARC Job ID became a truly globally unique identifier.

The new data staging framework (codenamed DTR), which replaces the downloaders and uploaders, was previously released as a prototype in 11.05 and now sees its first production-level release. In addition to numerous bug fixes and scalability improvements, data staging over multiple hosts is now supported and several new configuration options are available. For more information see NorduGrid Wiki.

The JURA accounting component on ARC CE now supports generation of reports according to the Computing Accounting Record (CAR 1.0) format. It is also now able to send accounting records to the APEL server via SSM transfer protocol.

ARC client library and the command line interface underwent a major re-writing, making it yet more modular and extensible to any combination of interfaces. Currently supported information query interfaces are LDAP-based ARC, Glue1.2 and GLUE2 ones, as well as WS-based GLUE2 and EMI-ES. For job submission interface, ARC legacy GridFTP is supported, as well as OGSA-BES and its ARC extension, and EMI-ES. Note that upgrade to the new ARC client is highly recommended, because older ARC clients from ARC 11.05 are not forwards compatible with newer ARC 12.05 WS endpoints that use completed GLUE2 information schema.

ARC command line interface receives a new configuration file allowing to configure preferences with granularity as fine as each Grid site. New command line options are available as well, and some old ones have new format.

All ARC components now support GLUE2 information standard. GLUE2 information can be published either via LDAP or as an XML document. This includes publication of ComputingActivities (jobs) per submission end-point in the LDAP rendering and GLUE2 HealthState for each end-point in both LDAP and XML. ARC service configuration now supports specification of GLUE2 AdminDomain and Policies, and has a debugging option for BDII to reduce slapd logs overhead. Note that this completed GLUE2 implementation breaks backwards compatibility with ARC 11.05 clients that relied on an incomplete implementation to interact with pre-production WS interfaces.

Security tools have been improved, now supporting proxy generation from a PKCS12 certificate.

A dedicated library for credential manipulation and secure communication is now available, CaNL++. At the moment it is not yet used by any tool, and as such is a technology preview and a prototype for a common authentication library in EMI.

The 12.05 release comes complete with a set of Nagios probes that can be deployed against ARC Compute Element and other services.

ARC components: detailed new features and deployment notes

ARC components in release 12.05 are:

ARC Core

The X.509 credentials handling code got significant cleaning which enhanced its stability in corner cases. Functionality was also extended.

Globus code handling was enhanced to work properly with dynamically loaded plugins.

Thread safety regarding usage of environment variables was enhanced.

Support for GridSite Delegation interface - both versions 1 and 2 - was added.

Adoption of Argus server as source of authorization and user mapping decisions.

Numerous enhancements and fixes for data protocols - especially SRM protocol. Preliminary support for GFAL2 library and hence protocols supported by it. Implemented initial support for Xrootd data protocol - read-only yet.

Support for old authorization configuration ported into new framework including support for VOMS, LCAS and LCMAPS. That makes it possible to use WS services with [vo] and [group] based authorization configuration.

Support for controllable VOMS error processing allows to adjust for different VOMS error tolerance policies.

Significant cleaning of SWIG bindings. Few enhancements for Python.

ARC Clients

The ARC client library is now capable of querying LDAP-based GLUE2 information and submit and manage jobs on services with the EMI-ES interface.

The concept of the grid middleware flavours (e.g. "ARC0", "ARC1", "BES", etc.) is removed from this release, and while the library does its best to discover and figure out what protocol has to be used to connect to a given service, when there is still a need to specify, the GLUE2 concept of the InterfaceName is used. The InterfaceName is a string specifying the name of the interface of a service endpoint, e.g. org.ogf.emies or org.nordugrid.ldapng. For the possible values please see the ARC Information System section.

Note that upgrade to the new ARC client is highly recommended, because older ARC clients from ARC 11.05 are not forwards compatible with newer ARC 12.05 WS endpoints that use completed GLUE2 information schema.

The syntax of some of the command line arguments are changed too:

The client configuration file (client.conf) was also restructured:

A new command, arcmkdir, is introduced, which allows directory creation on grid storage elements and catalogs.

libarcclient

After significant restructuring, the C++ library (and also the Python and Java bindings of it) provides a more developer-friendly API, and the data model is much more aligned to the model of GLUE2.

With respect to resource discovery the TargetGenerator and TargetRetriever classes has been replaced with the ServiceEndpointRetriever, TargetInformationRetriever, ComputingServiceRetriever and JobListRetriever classes, where the ServiceEndpointRetriever class should be used for querying index or registry services for any type of service, the TargetInformationRetriever class for querying local information systems, the ComputingServiceRetriever class which combines the functionality of the two first classes, and then the JobListRetriever class which is used for retrieving information on jobs. The reason for this change was that the TargetGenerator and TargetRetriever classes was utilized for the just described multiple different functionalities, and didn't provide a very flexible usage. The new classes relies on the same template class EntityRetriever, fully threaded within the class itself in order to be able to spawn multiple querying requests in parallel. The querying in particular is carried out by specialised plugins and since the threading is kept within the EntityRetriever class these plugins do not have to worry about threading issues.

In order to access the retrieved information, be it Endpoint, ComputingServiceType, or Job objects the EntityRetriever class utilises the concept of a consumer class, which is a class which implements an addEntity method, which means that retrieved objects will not be kept with in the retriever classes, but will be handed over to the specified consumer object.

The Broker class in libarcclient has also been subject to restructuring and is consequently backwards incompatible. It should now be used as a tool for adding matching targets to a set, which can be ordered using the comparablility of the Broker. Technically the Broker no longer keeps targets within it self, instead a suitable external container should be used to hold matching and possibly ordered targets. The previous need for using a loader instance in order to use the broker is now no longer necessary, as it has been encapsulated inside the Broker class itself.

Restructuring has also been done on the Submitter class, which now is more a convenience class, rather than an abstract plugin class as previously. It also incapsulate the loading functionality, and can be used directly. Additionally the Submitter plugins has been changed to accept a list of job descriptions, making bulk submission possible.

Previously the libarcclient library had a strict dependence on the job list file (~/.arc/jobs.xml), however that dependency has now been removed, which mostly affected the JobController class. That class had now be renamed to JobControllerPlugin reflecting that it is a abstract class meant to be extended by specialised classes. Most of the functionality previously covered by the JobController class has now been moved to the JobSupervisor class, which also incapsulates the loading functionality of plugins. As with the Submitter plugins the methods of the JobControllerPlugin class has been changed in order to allow multiple jobs to be processed by a single method call, allowing for bulk operations.

Since a computing resource now can have multiple endpoints, queues etc. there was a need for adapting the ExecutionTarget class, to fit the GLUE2 data model, as mentioned above. It now consist of shared objects named according to GLUE2 entities, and the classes of these objects contains public member variables corresponding to the associated GLUE2 attribute for that entity, making it completely backwards incompatible. As hinted above resource discovery now returns ComputingServiceType objects, which consists of multiple shared objects reflecting computing endpoints, queues, etc., where as an ExecutionTarget object only consist of one of each of these shared objects.

Also the Job class has been subject to minor backwards incompatible changes, most importantly the Flavour attribute has been removed, and its usage replaced by the InterfaceName attribute. The SubmissionEndpoint, InfoEndpoint, ISB, OSB and AuxInfo attributes has been replaced by the IDFromEndpoint attribute which holds that information.

ARC Compute Element

The EMI-ES interface is implemented. It is disabled by default but can be turned on through the option enable_emies_interface=yes in the [grid-manager] section of arc.conf. The interface will be accessible at the endpoint given in the arex_mount_point option, which must also be specified. Clients can request this interface using the interface option org.ogf.emies.

Note that EMI-ES interface is discoverable only via information endpoints that make use of GLUE2 schema, which breaks compatibility of ARC WS endpoints with older ARC 11.05 clients that relied on incomplete GLUE2 to communicate to pre-production WS endpoints. Upgrade to new ARC 12.05 client is highly recommended.

Information interfaces now along with proprietary NorduGrid rendering provide standard GLUE2.

WS interface now supports authorization configuration of GridFTP interface hence making WS and GridFTP interfaces fully interchangeable.

The new data staging framework is disabled by default but can be enabled through the option newdatastaging=yes in the [grid-manager] section of arc.conf. No other change in configuration is necessary and data transfer limits will be taken from the maxload option as before. However, it is possible to configure many parameters of the system using options in the new [data-staging] section.

Multi-host data staging allows data transfer to be spread across multiple hosts and can lead to increased data throughput for the site. This system can replace the multi Grid Manager setup. It can be enabled by deploying the DataDelivery Service on remote hosts and adding those hosts to the data-staging configuration. A variety of deployment scenarios are possible depending on the site setup. For more information on configuration and deployment of the new data staging system see the Wiki page.

Configuration of JURA is described in the ARC CE system administrator manual. Reporting in CAR 1.0 format is enabled by default. APEL reporting is still in beta testing state, and depends on the SSM version.

ARC VOMS AC-based queue policy enforcing plugin (arc-vomsac-check) for A-REX is added to the release. The plugin designed to be used as a handler for ACCEPTED state in arc.conf and introduces new option ac_policy for [queue] sections that allows to configure access filters based on FQANs provided in user proxy-certificate with VOMS extension. See man arc-vomsac-check for more information.

LRMS backends updated - especially DGBridge.

ARC Information System

ARIS in release 12.05 supports the latest official GLUE2 schema; this schema is now shipped with EMI packages. While some GLUE2 support existed in previous releases, several new features are introduced, largely completing the implementation.

ARC information system is compliant with the latest GLUE2 XML schema.

LDAP GLUE2 schema has a new tree to accommodate AdminDomain in a separate branch.

Support is added in arc.conf to specify GLUE2 AdminDomain and Policies.

Publication of ComputingActivities (jobs) per submission endpoint is implemented. ComputingActivities are currently published only via LDAP.

GLUE2 HealthState is added for each interface.

The following new GLUE2 compliant names for Services and Endpoints are introduced:

In addition, debugging option for BDII in arc.conf is added, to reduce slapd logs overhead.

Note that this completed GLUE2 implementation breaks backwards compatibility with ARC 11.05 clients that relied on an incomplete implementation to interact with pre-production WS interfaces.

ARC gridftp server

Improved support for IPv6.

Nagios plugins

This is the first official release of Nagios plugins for NorduGrid ARC service monitoring. The probes are primarily meant to monitor ARC services, though some of them are more generic. The package includes probes to:

The probes are packaged for various platforms under the names

The latter package contains all relevant documentation. The documentation is packaged separately from the rest of ARC documents, as the Nagios probes are typically deployed away from monitored services.

Binary packages are available in EMI and NorduGrid repositories. Source code is available from the NorduGrid repository (see below), and from the original git repository.

Common authentication library CaNL++

CaNL++ provides set of functionalities for credential manipulation and secure communication, namely:

Fixed bugs

Since ARC 11.05 update 3, the following bugs were fixed:

Known issues

ARC GUI (arcjobtool) is not available yet, pending implementation of client library changes.

Standalone client tar-balls for Linux are not yet available.

Availability

Source

ARC release 12.05 consists of the following source packages:

Source code for main components is available from:
http://svn.nordugrid.org/repos/nordugrid/arc1/tags/2.0.0

Source for the compatibility package (old client) is available from:
http://svn.nordugrid.org/repos/nordugrid/arc0/tags/compat_1.0.1

Documentation source (mostly LaTeX) is available from:
http://svn.nordugrid.org/repos/nordugrid/doc/tags/1.1.2

Source for metapackages is available from:
http://svn.nordugrid.org/repos/packaging/{fedora,debian}/nordugrid-arc-meta/tags/1.0.1

Source for Nagios probes is available from:
http://svn.nordugrid.org/repos/nordugrid/nagios/tags/release-1.3.8

Source for the common authentication library caNl++ is available from:
http://svn.nordugrid.org/repos/workarea/caNl++/tags/0.2.0

Repositories

See detailed description at NorduGrid downloads

These repositories provide binary packages for:

Scientific Linux and CentOS are implicitly supported through corresponding RedHat repositories.

Microsoft Windows and Mac OS X versions are available from same repositories for clients and some services.

NorduGrid homepage