NorduGrid-NDGF-NSC-PDC joint meeting

7-9 July 2003, Copenhagen

(See also the tentative agenda)

Minutes

Present: Anders Wäänänen, Jakob Nielsen, Balázs Kónya, Aleksandr Konstantinov, Mattias Ellert, Leif Nixon, Niclas Andersson, Brian Vinter, Maria Gunther Axelsson, Michael Kustaa Gindonis, Juha Lento, Fredrik Hedman, Johan Danielsson, Gian Luca Volpato, Oxana Smirnova (through video link).

Topics discussed:

Status overview of the projects

For NorduGrid see slides by Balázs.

NDGF postdocs: the Swedish (Maria) and the Finnish (Juha) postdocs have already started, the Danish (a candidate from Romania) have been decided, the Norwegian is still pending.

NDGF's main objective is to coordinate, organize Grid activity, NDGF is not really interested in user support. No.1 priority of the NDGF is to "Gridify" the centers.

PDC informed us about their European Grid Support Center application. PDC will host the Nordic Grid Center in the EGEE. PDC organizes a Grid Course (18-26 August) mostly for Swedish universities students. PDC's main interests are the AFS, Kerberos hooks to the Grid.

Mattias presented the SWEGRID status: hardware specifications for the six clusters has been sent out to vendors, the clusters "realistically" won't be operational before October?

The Danish Center for Grid Computing: starts 1st August, 1 postdoc and 4 PhD students, the research topics of the 4 PhD students: AAA, schedulling, control+monitoring, distributed shared memory access over clusters. The postdoc and the test cluster will be placed in NBI.

John Renner Hansen among others talked about the EGEE and the data transfer tests performed at NBI.

Michael presented the grid activities of HIP, at the moment HIP runs on the Grid with the Hirmu cluster (administrated by Michael), and the Kumpula cluster is being bought. There is no special reason why NorduGrid VO is not being authorized on Hirmu, only the config needs to be fixed, updated.

Globus software

Anders talked about the "different flavors" of the Globus Toolkit. At the moment these are the available Globus flavors (or distributions):

The NG-Globus contains important fixes, otherwise it is compatible with the original Globus, it is just a repackaging of that. VDT Globus breaks the compatibility with the way they distribute the Replica Catalog. Because of politics, EDG/LCG/EGEE decided to use the VDT Globus. NorduGrid sites run the NG-Globus, SWEGRID will presumably use the NG-Globus, NDGF will use the NG-Globus, while PDC might prefer the VDT.

The globus-config is used not everywhere on the NorduGrid, some sites like the Hirmu cluster prefer direct configuration via the original globus files. Mike might change to globus.conf.

We decided to build the NorduGrid releases and the NG Globus binaries on the NorduGrid as grid-jobs, Anders will work out the details of the "build" runtimeenvironment.

Data management

Aleksandr presented the development status of the "intelligent SE", the new SE will be a https over GSI => we are moving away from the gridftp.

Balázs raised the question of "storage space management (local quotas)" as a required functionality of the intelligent SE. This is not part of the present development, the intelligent SE only addresses the problem of reliable data replication.

Short term SE management: It is decided that we set up a couple of "default" storage elements, as a kind of "grid scratch areas" where every NorduGrid user (member of the VO) can write. Lund (~1 TB), NSC (half a TB ?), NDGF (couple of TB) will contribute to this default grid storage. Balázs will send around the setup/config instructions of this storage. The SEs will be registered as locations in the RC as well. Important to notice: no guarantee is provided for the data stored in these "grid scratch areas". Data intensive applications should negotiate their own storage requirements with the providers directly.

Users and applications

It seems that until the real start of the SWEGRID & NDGF the active user base of the NorduGrid facility will remain the same: ATLAS group and couple of individual users.

The SWEGRID users will not appear on the Grid before the end of this year.

In Denmark the centers wish to "push" their user base to the Grid, soon the Danish centers will only be available through the (nordu)grid interface, no more local pbs jobs in Denmark. Brian predicts that by the end of 2003 the Danish chemists using HPC resources (mainly the Dalton, Gaussian community, ~100 users) will be on. By the end of 2004 Brian estimates thousands of active Grid users.

It is the NDGF which has the main responsibility for recruiting grid users and helping their grid migration.

User support

In a short term there will be no change in the support mechanism, the developers continue to provide the support, the main support forum is the

SWEGRID will support its own userbase, the EGS (EU Grid Support Center in PDC) if funded will consider to provide some support for the NorduGrid users/toolkit as well. NDGF is not interested in running a support service.

Site problems: in case a user experiences site specific problems (i.e. all her jobs constantly fails at grid.quark) the site should be contacted directly (MDS has the site contact info).

We agreed to review the support mechanism later when the user base increases.

Feature requests / toolkit enhancements

Proxy renewal: Alekandr is improving the error messages for job failure in case of expired proxies. In the future if a user's job fails due to expired proxy (this will be clearly indicated in the job's failure message), the user will only need to generate a new proxy and run ngrenew. ngrenew will upload the new proxy to the failed job and the GM will finish the job successfully (post-processing phase). This request is entered into the Bugzilla (bug #85)

Job resubmission: on its way, it depends on the local logger facility (bug #86)

A switch to select between failed/successful jobs: bug #87

The "cache", new xrsl attribute which sets the default caching behavior, is resolved within the UI (bug #73)

Direct selection of a cluster (without MDS query): bug #88

The memory xrsl attribute means request of physical memory, the GM->PBS interface should take this into account (bug #25)

A new grid job status ("PENDING"?) needs to be introduced. bug #89

Interactive sessions: a separate globus-gsi-sshserver package will be created although the globus.conf will be used to configure the gsisshserver. We need a new utility (wrapper?) "ngssh" or "ngsub -I" which will pick up the right machine, that is something like this: ngssh -architecture pc should automatically give a shell to the user on the right platform. NDGF is assembling a pool of machines for this interactive service.

Certificates

Anders presented the NorduGrid CA. It was decided that the CA infrastructure will be based on Registration Authorities. A Security Group will be set up by Anders to address operational and policy issues. The Group should deliver a solution by the end of September.

Important: users need clear instructions and a "user friendly" solution in the future how to request certificates!! This kind of user support is part of the CA activity.

We had a lengthy discussion on the problem of storage elements & credentials. Sites will publish in some form the supported CAs information. Anders will provide a utility which can locally process the content of the /etc/grid-security. Then it is the UI's task to match the clusters "supported CA information" to the CA of the requested datafile. Important decision we made: uploading credentials of non-supported CAs is not permitted, a grid job should not overwrite the authentication decision of the site.

Audit/incident response: we discussed but came to no decision. To treat this problems a secure mailing list is needed. The Security Group should cover this item too.

Authorization

Balázs sketched the present LDAP-based system, Gian Luca reviewed the VOMS.

Aleksandr described the authorization possibility of the gridftp server: user groups for file plugin can be defined as list of DNs, VOMS group/role/capability, LDAP group. The gridftp-server.conf template contains the latest documentation on it.

Access control (ACL to files): the gridftp file plugin supports GACL, users should get familiar with GACL, sites can set a default ACL for directories.

User database: we will use VOMS as the user database, a test server (basic setup) is running on grid.fi.uib.no. The administrative interface of VOMS, at the time Aleksandr tested, had problems. Balázs will migrate our LDAP user database to a VOMS server. The NorduGrid client software should come with the voms-proxy-init, the voms-proxy-init should be packaged into the client.

VO management: after we have the VOMS server up and running with the present NorduGrid user data, we can start defining groups and grouping the users, assigning group managers. Balázs will suggest a draft set of groups. When the technology gets matured (we successfully migrated to VOMS) the operation of the VOMS server will be passed over to a center (NSC?)

As a first step we will use the VOMS/group,role,capabilities only for file access (file plugin) authorization. Later, we will use it in the job submission, the gridmapping as well (the roles/groups/capabilities could have their own mappings). Balázs should think over the consequences of the VOMS-base mapping (more complicated rules, not just a list of authorized users) on the infosystem and on the monitor. A possible model of the future infosys query: the user presents his VOMS data together with the query, the info providers make use of this info.

User groups and policies: to be defined later by some competent body. As it is now we will have a default NorduGrid group, and all the sites belonging to NDGF should open up there resources for this default group up to 2%. The 2% local quota should be enforced by some local tool. The important thing is that the NorduGrid VO gets mapped on every resource which has some relation to NDGF. This default group is only for testing, trying out the grid. For example the ATLAS production runs will be taken out from this.

Accounting:

Aleksandr told us what we have now:

The local logger (or frontend utility, part of the grid manager) keeps log of every job locally on the frontend. Information on grid jobs is written twice, at the start and at the finish of the job. The job records are sent to a central logger via SOAP. Optionally they will be able to stay on the site (for now these files are removed after the records are sent to the central logger).

A proposal for a job record will be (already done) chosen by Aleksander and posted to the mailing list, added to the documentation. It should contain info on the owner of the job (DN, VOMS group/role/capability), on the stage-in data movement, stage-out data movement (size of transfered data), on the execution site (host architecture, runtimeenv, etc), on the resource used by the job (CPU time, memory, disk space, of course only if these are measurable), and the xrsl.

The test central logger, a MySQL database accessible through SOAP, runs on https://grid.uio.no:8000/logger. The central logger needs a web interface.

Decisions:

Authorization and accounting:

We had a lengthy discussion on how to couple the accounting information to authorization, the different grid-economy models came up. Technically we are very far from it. The most we could say was that there would be an interface to call plugins written by politically mature people, and as soon as requirements will be presented by those who will write them. Aleksandr hopes those plugins will never be part of GM code.

Operational issues

The Grid infrastructure, which was created by the old NorduGrid project, will be kept together and extended by the SWEGRID, NDGF resources. We will try to keep a single "Grid" in the Nordic countries.

Information Indexes (GIIS): explicit list of allowed resources, when a new site wants to join the Grid, its information content first needs to be "validated"

Clusters: sites which are related to NDGF will map the NorduGrid User Group and allocate 2% of their resources.

Storage: default SEs (grid scratch area, see before) at Lund, NSC, NDGF

CA: run at NBI with the new Registration Authorities model to be set up by the end of September

Credential Repository (myproxy service): was turned down due to trust and security issues, NDGF is not interested to operate it on a NorduGrid level, SWEGRID might provide a similar service to the SWEGRID users. Aleksandr is interested in one to be set up for test purposes.

Monitor: www.nordugrid.org continues hosting a monitor but everybody is welcome to set up their own monitor (in this case please inform us about it)

Software repository: has been cleaned up, it is a full repository in the sense that all the NorduGrid and the required software (dependencies) are provided. Now we have so many packages & dependencies we need a short NorduGrid server install instructions, indicating which packages are required and which are optional. Anders will provide a single-page install document.

Bugzilla: hosted in Oslo, this is the official NorduGrid software support platform, everybody is encouraged to use this.

CVS: it is in Oslo, daily snapshots are available. Anybody wants to contribute to the NorduGrid middleware can get write access through asking for user account at UiO. We'd like to see all the NorduGrid related software development hosted by this service.

Releases: we try to maintain the Friday weekly tagging. Important tags are announced and clearly stated if sites need to upgrade. These release notes will be available in the FTP repository (next to the tag/release) and posted to the site-admin list as well.

Mailing lists: site-admin is intended to be the main announcement platform for informing resource providers on new releases, important changes. Every site admin is requested to subscribe. We decided to create a nordugrid-announce, this is for informing the grid users. An invitation to this mailing list will be sent out to every NorduGRid VO member, new VO members will receive this invitation when they are added to the VO. nordugrid-discuss will remain the main forum for NorduGrid related technical discussions, NDGF postdocs will join.

Task catalogue: the list of "tasks", unsolved development problems will be kept in a "task catalogue" available from the NorduGrid website. If you have a well-defined software development task, please send it to the discuss. "Tasks" such as "authorization system is needed" are not well-defined ones.

7-9 July 2003, NBI, taken by Balázs Kónya