Workshop on Compute Resource Management Interfaces (Rome, February 17-18 2005)

Workshop on Compute Resource Management Interfaces

Held at INFN headquarters, Rome on February 17-18 2005 and organized by Mirco Mazzucato.

Attendees:

Mirco Mazzucato (INFN, EGEE/LCG)
Bob Jones (CERN, EGEE)
Erwin Laure (CERN, EGEE)
Miron Livny (Condor, EGEE)
Ian Foster (Globus, EGEE)
Francesco Prelz (INFN, EGEE)
Kors Bos (NIKHEF, LCG)
Oxana Smirnova (Univ. Lund, LCG & Nordugrid)
Aleksandr Konstantinov (Univ Oslo, Nordugrid)
Mikko Pitkanen (Hip, Nordugrid)
Ralf Ratering (Intel, unigrids)
Dave Snelling (Fujitsu, unigrids/nextgrids)
Claudio Cacciari (Cineca, unigrids)
Anders Waananen (Niels Bohr Inst., nordugrid)
Balazs Konya (Univ. Lund, nordugrid)
Maarten Litmaath (CERN, LCG)
David Smith (CERN, LCG)
Massimo Sgaravatto (INFN, EGEE)
Luigi Zangrando (INFN Padova, INFN grid)

The agenda for the meeting is here http://www.pd.infn.it/grid/crm/proposed.agenda.html

Some slides presented during the meeting will be uploaded later.

The email list is created: egee-project-crm@cern.ch; the (protected) archive will be available. The list is managed with the SIMBA tool at CERN.

Background:

Various useful systems have been constructed for enabling remote access to compute resources, and there seems to be general consensus that these systems should not evolve in isolation. Thus, we have called this workshop to enable information exchange about experiences and discussions of opportunities for common approaches. Participants are invited based on their experience with the creation and operation of compute resource management systems, with a particular emphasis on those designed to support high-throughput applications. We hope that the workshop will result in a common view of the problems to be solved, the pros and cons of different approaches, and opportunities for convergence in a future work.

One of the main goals of this workshop is to make progress towards the definition of standard "compute resource management" (CRM) interface allowing different implementations to co-exist and interoperate as is happening for the different storage management systems thanks to the SRM interface. In this way the Grid community will make best profit of the funds available and the practical experience will slowly select the best implementation choices.

The whole general problem of CRM is clearly too large to tackle all in one go and should be broken down into sub-problems, relating for example to virtualization, compute resource modeling, file staging, security, job submission, job management, and so forth.

Agenda outline

After being welcomed by Roberto Petronzio, President of INFN, Mirco lead a discussion on the agenda.

The outline of the workshop was to have each of the projects present describe their work in the area, experiences gathered and plans for the future. Based on this initial input the group could then determine what is the scope for a common approach and eventual convergence.

Outcome

The workshop agreed to focus on the specification of a Job Description Language and a Compute Resource Interface.

Working towards a common Job Description Language, the participants agreed to compare their own implementations to the recently reworked Job Submission Description Language being developed within GGF.

The area of Compute Resource Interface is consider more complex and it was necessary to reduce the scope of what is being compared to increase the chances that commonality could be found.

The participants agreed to draft a document comparing the approaches to job description and submission of the different projects by 18th April 2005 so that the results can be discussed at a face-to-face meeting to be held at the 3rd EGEE conference in Athens (http://public.eu-egee.org/conferences/3rd/). A phone conference will also be organised for this meeting.

The details of the document are as follows:

Editors: Ian and Dave
Chapter 1: Compare JSDL to
- Globus JDL - Karl C.
- Unicore AJO - Claudio C. - Done
- NorduGrid xRSL - Oxana S.
- gLite JDL - Massimo S.
- Condor ClassAds - Alain R.
Chapter 2: Overview of Job Management interfaces to
- Scope: Resource description, job creation, status, and basic control
  - Exclude: scheduling, placement, accounting, environment, security, resource description content model, etc.
- Unicore/GS - Ralf R.
  - Others may wait for this template.
- GRAM4 - Ian F.
- gLite CondorC - Francesco P.
- gLite CREAM - Luigi Z.
- NorduGrid - Balazs K.

An email list has been created for discussion purposes and to simplify the work on the document.

The participants also wrote down a number of statements that gave the overall goals and current thinking of the group:

All the participants of this group are committed to constructive evaluation of JSDL in the near future.
All the participants of this group will consider incorporating JSDL into their tools, following this evaluation.
Some members of this group are committed to making some software supporting JSDL. All of these people are committed to an interoperability
All the participants of this group will meet again (Rome in early October?) to assess the possibility of developing an agreed set of interfaces for job submit and management.
All participants are encouraged to interact with others in identifying and resolving differences in opinion with respect to these interfaces. This includes involvement in any working groups formed in this area.
This group does not have a closed membership and its output is public.
There is an open invitation to folks who want to contribute to the Computing Element White Paper.

Detailed notes

Erwin presented the approach of EGEE and plans/timescale for potential standardisation.

In the discussion that followed Ian Foster said he liked the model presented for the gLite CE because it had the potential for standardisation in a number of areas (e.g. Condor-C as a VO-agent can be replaced). In particular Miron explained that this is the current approach but in the future, provided interfaces are respected, components of the CE could be replaced by other implementations deployed on different infrastructures and allow for co-existence on the fabric with existing systems.

The model presented for gLite appears to be consistent with the architecture outlined in the document written by Miron and Ian.

David Smith spoke about CRM issues found during the deployment and usage of LCG-2:

scaling problems on the resource interfaces (instantiation of job mgrs and monitoring process for each job etc.)
network requirements can cause deployment complications (all service machines require inbound connectivity for contacting call-back addresses etc.) but limiting the open port ranges would make sites happier. The number of open ports required depends on the number of jobs running on the WNs and hence has to be configurable. File descriptors are another form of resources. What is really required is secured communication channels and we have to be aware of the limited number of ports available. UI machines (e.g. personnel laptops) used to submit jobs should not require inbound connectivity. Guardian Angels (a process running on the WN with the job) is seen by Francesco as a way of getting around these connectivity limitations and provide interaction with a job (e.g. being able to run a debugger on a job).

It was noted that several user communities have implemented schemes of 'pilot jobs'. i.e. submitting jobs that when executed request a number of specific tasks from a service operated by the user(s). As well as fault tolerance the pilot jobs are enabling the community to implement a more complicated scheduling than was provided by the standard infrastructure. It was noted that multiple communities had decided to undertake this approach - suggesting that the approach is providing flexibility the users find particularly desirable.

Further discussion noted that fault recovery that is implemented by the LCG infrastructure, in the form of job resubmission, is often disabled by the user explicitly. Reasons for this include either the user payload being unable to be safely rerun or a desire on the part of the user to be made aware of failure and to explicitly control resubmission. It was noted that it is generally accepted that is desirable to have applications that are safe to rerun, but that currently the user community sometimes find this not always feasible.

Ian Foster presented the contents of the document by him and Miron distributed previously. See document for details.

An important aspect of the architecture is that CE’s within a CR do not interfere with each other.

For a CE to get permission to use a network connection it must pass by the CE monitor. David suggested that the concept of a CR is virtual and not really required since all interfaces are directed to the CE. Miron prefers to keep it as place-holder for now but would be prepared in the future to drop if is proves unnecessary.

During the discussion, Miron clarified for Aleksandr that he sees the work of EGEE and LCG in deploying of large-scale production services as important inputs for defining CRM and while we looking for consensus at this meeting, nobody will be forced to adapt but he expects people to migrate since they will see the advantages in doing so.

David, Ian and Miron insisted on the approach that we define interfaces for all important parts of CRM that will allow people to provide their own, compatible implementations.

The importance of the CE head node was confirmed as a means of monitoring and controlling the use of the resources by the jobs to ensure they are not abused.

We need to identify a core subset of software to form the basis of the CE that can be trusted by everyone in the community.

Miron insisted that we should work towards establishing social behaviour in our user communities by making the CE capable of detecting unsocial behaviour (monopolizing resources) and giving the site administrators the means to disable access and privileges.

Lunch

Dave Snelling presented the status from a standards point of view.

The subject of this meeting is covered by OGSA-EMS (Execution Mgmt Services) building on elements in the Base Profile and linking to VO Mgmt.

The JSDL (Job Submission description Language) specification is also relevant and has been largely rewritten over the last 6 months. Note this is just a specification language not a job management interface. Most contributions have come from providers of batch systems but the current standard is considered to be a sub-set of what is available in many implementations.

JSDL does support pre-stage, execution & post-stage phases and multiple data transfer protocols. OMII and UNICORE and working with implementations based on early versions of these specifications. A BoF is planned at GGF13.

Dave clarified the relationship to DRMAA.

Erwin asked where the output of this meeting should go. Dave suggested GGF would be the right place since it sets on the borderline between networking systems and management services.

Ian suggested that resource virtualization and security aspects of CRM are not yet mature enough for entering this standardisation process and Dave agreed.

In terms of resource models, the CIM is the most complete and precise but is not abstract. GLUE is more abstract but has a complete implementation and is currently deployed by LCG/EGEE and OSG.

Dave thinks that, after the feedback received at GGF13 has been taken into account, JSDL would be in a state that is considered mature enough to be used as a specification for implementations.

Ian said he expects Globus to work towards this standard as more resources in the Globus team become available to do the migration.

Miron questioned the strength of motivation for implementing any of these standards since the cost of conversion is high for existing systems.

Mirco stated that the major successes of GGF (GLUE, SRM etc.) come from projects that are working on implementations and are committed to adopting the standards they help to define. Dave agreed that having technologies in the field is a strong incentive for pushing forward the standards. Anders added that having these implementations deployed buy the projects helps improve the draft standards.

Luigi Zangrando presented the work of the INFN grid on CEs.

CREAM - Computer Resource Execution And Mgmt service

The interaction between gSoap and Axis implementations for WS-Addressing will need to be followed closely to avoid inter-operability issues.

Francesco stated that there are concrete plans to implement true WS interfaces for the WM and CEMon.

The Information Supermarket can take info from many sources and concerts into a format suitable for match-making as required by the RB.

Francesco said the concept and support of the sandbox was introduced because the SEs were not reliable enough to manage the transfer of all info required for a job to execute. As the SEs improve the need for the sandbox will diminish but users find them very convenient.

There was a discussion about if file transfer should be part of the job and whether a job cannot be completed until such transfer is successful. Opinions did not converge on this point.

Not clear what is the granularity of CREAM (i.e. one CREAM per user, per VO etc.) since it depends on policy decisions.

Not clear what is the relationship between CREAM and the CE architecture diagram for EGEE presented by Erwin.

Coffee

Anders presented the work of ARC.

Advanced Research Connector (ARC) is the grid middleware developed by the NorduGrid collaboration based on Globus pre-WS libraries and APIs.

Stage-in/out from the task area are considered important.

There was a discussion about whether the ARC requirements for the CE are really describing a local batch system but it seems that this is a subset of what is included in the proposed CE architecture presented by Ian.

Ian presented the Globus work on Execution Management Services

GT4 GRAM is a new Java implementation (as in GT3) based on Axis with C and Java clients.

GT4 GRAM has performance and scalability improvements compared to GT3.

GAS has been replaced by GridFTP for data staging (as is the case in ARC)

Ian presented some scalability figures (8000 jobs submitted and 70 jobs per minute)

He also described the Workspace management service that is being developed for use with gLite in EGEE

Xen (open source on Linux) and VMware (commercial) has been tried and compared to understand the VM cost needed to create a job. Ian said he found VM technology more robust and available with moderate costs.

Miron presented the Condor plans. They are not developing WS tool themselves but rather behaving as a user (if a tool does not work they drop it and move on).

He said users liked to be able to include some “rules” with the job (i.e. act on exit, act on re-try, act on failure etc.) since that fits better with their logic and this should be considered for job submission interfaces.

They do not want to rely on DBs for support of status and accounting info since no-one has shown they can remotely deploy mySQL or Postgres with the click of a button needed for “on the fly” deployment.

For scalability, Condor users have up to 10**3 compute resources and 10**5 jobs.

Consistent maintenance of state across the different elements through the lifetime of a job is a major concern for Condor.

Condor has a clean separation between resource allocation and job execution to allow different scheduling points in the submission chain.

Should the sand (contents of the sandbox) be moved on the same route as the job submission or delegated elsewhere and is linked to credentials?

Ralf described the UNICORE grid Programming Environment

UNICORE is developing a GPE with a grid programming library based on OGSA as part of the Unicore project.

The important standards for UNICORE are:

JDSL
CIM
BPEL
WS* (WS-Addressing, WSRF, WSN etc.)
OGSA

A common gridbean can run a variety of applications.

Unigrids has a work package to investigate inter-operability with other OGSA compliant implementations (notably GT4)

The interfaces for atomic services are being defined which include job mgmt service, target system service, storage mgmt service, file import service and file export service.

A target system registry keeps track of static & dynamic information.

The Target system service contains information about which application software is available at a site (and supports multiple versions)

A 3 phase approach to implementing a security independent of the base services has been defined starting with client authenticated https and leading towards WS-Security based scheme for authorized remote file mgmt.

2nd day

At the start of the day it was agreed to take forward the work on JSDL and interfaces for the CE. A document will be produced with Ian as the editor taking input from each project. A basic template will be offered to each project.

Miron added that, besides the document, the important point is to have the sw follow and not just stop at the paper level. He encouraged people to work on a common software element so that within a year we can deliver something to the community with the minimum of effort.

Ian said working towards a CE with common interfaces is an excellent goal and he thought the JDSL is gaining ground so is worth encouraging.

Dave suggested a collection of common jobs that each group could execute in their own representation would be most useful. Translators from JDSL to the different representations used by each of the projects would be a way of getting some form of job inter-operability.

Going in this direction we could then organize an inter-op event to verify the set of JDSL jobs can be run each project’s software stack.

Everyone will look and see if it matches with each project’s JDL.

Miron added that, through this exercise, if we come to common agreement and want to change something then we should go ahead and let the standards catch-up with us later.

Dave then made a presentation on JSDL – Job Submission Description Language

Purpose of JSDL is to standardize job requirements to increase inter-operability between job mgmt systems. Security and scheduling is out of scope for JSDL. JSDL covers job identification, application, resource and data requirements.

JDSL can only define one (and only one) job – what about multiple jobs?

The extensibility facility (represented as “other”) in JDSL allows adding extras from different implementations that are not yet defined by JDSL.

Erwin suggested that JDSL should be used as both input and output for CEs so that CEs can be chained together.