
Present: M.E., A.K., B.K., O.S., A.W.
Minutes by O.S.
Day 1
* A.W., O.S.: reports from the 4th EDG Workshop in Paris
- EU review of the EDG: very positive, the most successful project
ever
- However, application WP's (8-10) and some others are not quite
satisfied, and request stability, stability and stability
- EDG roadmap: most middleware workpackages are coming with
development of advanced versions of the stuff, but no bug fixes
for the existing one. ITeam intends to push WP1-5 towards bug
fixing though, in order to achieve a stable production release.
- Some details:
1) Globus moves towards OGSA, first pre-alpha to be released in
April; Globus 2 to be finalized and will be co-existing for
another couple of years
2) Ftree is dropped, and EDG's WP3 is preoccupied with R-GMA,
though used by noone
3) A.W. gave a talk on licenses for EDG: BSD-type seems to suit
everybody so far, Slashgrid is about to adopt one as well
4) ATF (Architectural Task Forse) is resurrected and re-shuffled,
aiming at [finally] drafting an EDG architecture; there's
certain interest to evaluate the NorduGrid's architecture
* A.W. will present the NorduGrid architecture at the PARA'02
conference in Espoo; contribution to be ready by April 1st and to
be distributed within EDG's ATF
* Issue of cooperation with EDG w.r.t. developing NorduGrid's own
installation: keep collaborating, sacrifying a machine per site to
EDG's exercises, but only with a second priority
* Issue of ATLAS software
- Jacob is preparing AFS-free RPM's (few days later: Christian
Arnaud & Cal Loomis did the same, but never tested :-)
- Objectivity is officialy dropped by ATLAS in favor of ROOT/IO
- RedHat 7.2 is about to get certified as the official CERN
platform
- DC1 is the natural application, starts on April 15
* Organisational issues:
- Milestones:
1) March 22nd (read: April 1st): architecture paper
deadline. Paper and its follow-ups to be submitted to all
possible forthcoming conferences (Balazs will distribute the list
with deadlines), and possibly submitted as the ATLAS note. To be
evolved in 3 instances: proposal -> implementation -> application
2) April 15th: start of DC1
3) May 18th (?): NorduGrid meeting in Helsinki. Demonstration of
the NorduGrid architecture, using ATLAS-related jobs
4) June 15-18: PARA'02 conference in Espoo
- E-mail discipline:
1) mails with the tag "URGENT" in the subject line must be
answered by everybody within 48 hours
2) threads are to be marked with a keyword (e.g., Information
System, Grid Manager etc)
3) initiator of a thread has to summarize the discussion before
closing it
- Remote conferencing: keep on calling for a phone conference
whenever necessary; to look into possibilities of a
videoconferencing (VRVS?)
- Next meeting is to be the dedicated integration meeting; to be
called end-April - beginning of May (May 1st is Wednesday) for 3
days in Copenhagen
* Short discussion of the Globus roadmap (after the slides of Bill
Allcock). OGSA is OK, but not of an immediate concern. Updates to
GridFTP, GRAM etc will be released meanwhile.
* Packaging:
- RPMs to be created from Globus' official releases (4 bundles?),
not CVS
- number of RPMs must be limited to 5 (server, client, common,
development, info)
- NorduGrid stuff should be installable on the top of an existing
Globus installation
- NorduGrid distribution should contain a lightweight, stripped
version of Globus, possibly with necessary patches and fixes
- User Interface (UI) (client) subset should be self-contained and
sufficient, i.e., installable on machines without pre-existing
Globus and without being a superuser. Must contain GridFTP and
LDAP (GSI-based search)
* Location of the NorduGrid software to be referred as the
NORDUGRID_LOCATION
Must not be hard-coded; standard location: /opt/nordugrid/ ,
contains all the necessary subtree (./bin , ./lib , ./etc
...). Everything must be relocatable.
* Configuration files: the Information System (IS) and the Grid
Manager (GM) need one each, but they can be merged into a single
nordugrid.conf file, residing at a standard location: either
/etc/nordugrid.conf
or
$NORDUGRID_LOCATION/etc/nordugrid.conf
Day 2
* Slashgrid: A.W.'s account. An example setup to be done at NBI, to
check the functionality of the certificate-based acccess
* Information System
- Issue of prefixes in attributes: prefixes should be identified
uniquely with each organisation, e.g., nordugrid. The choice of a
prefix should be possible make do during the installation/setup
- nordugrid-cluster-nodecpupower: at present expressed in bogomips,
but since it is not used for scheduling, should be more
human-readable.
- an explanatory description of attributes and possible accepted
values should me made (B.K.)
- RC info: to add DN of the catalogue and DN's of collections
- SE info: to add access protocol and mount point (a la EDG)
- queue: B.K. to write suggetsions for a NorduGrid PBS queue
configuration
- nordugrid-authuser-sn: to be shortened to just a real human name,
plus some ID in case of identical ones
* RSL
- not all the Globus-defined attributes need to be supported by the
GM (O.S. to prepare the final list)
- non-supported attributes should produce a warning message
- several attributes (executable, arguments, stdin, stdout, stderr),
specified Globus-way, will have to be re-written by the UI to
suit GM (executable: dummy, arguments: actual executables etc)
- startTime: refers to the download start time, not execution
- lifeTime: from the moment the job is finished; to be added to
MDS, can be user-specific
- some additonal (to the previously distributed list) attributes:
action, jobid (internal for UI), lrmsType, replicaCollection
- O.S.: to prepare the RSL template
* Grid Manager
- A.K. produced a flow-chart, which has to be documented
- GM starts downloader, Globus job submission (jobmanager-ng),
uploader etc
- job status info (files), RSL etc are contained in the job control
directory; status files to be owned by the root
- each job is assigned a session directory, path of which is a part
of the JobID (JobID is being put into MDS, this information can be
accessed only by the grid-mapped users)
- status of a job is scanned by the Helper from PBS logs (instead
of issuing qstat). Possible values:
ACCEPTED/PREPARING/EXECUTING/FNISHING/FINISHED
- upload of directories is not supported; neither are wildcards
- if RC location is not specified, GM should use the local one (?)
- jobmanager-ng is enabled in Oslo
- A.K.: to send around an example of user-side RSL file
Day 3
* User Interface
- performs job submission/cancellation/status query
- all active job ID's should be listed in the jobhistory file. Upon
job completion, its ID is removed from the list. The jobhistory
file should be reconstructabe from the MDS (in case of accidental
removal)
- user commands suggestions:
ngsub : submits a job
ngkill : terminates the execution
ngclean : terminates the execution and erases all the traces of
the job, including the info in MDS
ngget : retrieves the output, optionaly issues ngclean
ngstat : queries MDS and retrieves the job status (options
-f[ull] , -a[ll], -u[ser] should be
available). Job final status: either SUCCESS or
FAILURE; in the latter case ngget returns all the
associated files
ngresub : moves a job from a queue to another (forced
re-scheduling)
* Application runtime environment
- at the moment, ATLAS software releases. RPMs are on the way
(A.W.), to be installed at all the sites concerned
- in future, a description for each environment is needed
- possibly consider CERNLIB etc as a runtime environment?
* Storage Element (SE)
- each site: to set up a SE for test (not necessarily for ATLAS DC,
snce that one would need 2 TB of storage space). An SE better to
be a separate machine, allowing independent user-mapping
- main requirement: a user should be able to upload files from an
UI to a SE at any time, without sumitting a job
- open question: shall a new RSL parameter, requesting the job to
be moved to the data, be introduced?
- issue of mirroring (caching) data on request: not feasible with
existing tools, but can be a part of a future architecture
* Immediate actions:
- SE: each site to set up a separate machine with GRIS, mapping
everybody to a single user (B.K.: week 12, A.W.: week 13, M.E.,
A.K.: to study the possibilities)
- IS (B.K.):
1) update the schema
2) modify the providers (user - queue length, job - GM status)
3) re-write parts related to the static information: to be read
from the configuration file
4) prepare description of all attributes and options
- PBS configuration (B.K.):
1) cluster configuration suggestions
2) test script checking whether the PBS configuration makes sence
- RSL
1) example of an RSL script: A.K.
2) list of attributes and a general RSL template: O.S.
- GM (A.K.)
1) implement root ownership for the common control directory
2) provide an input for the configuration file
3) documentation (including the flowchart)
4) enable automatic selection of location in RC
- packaging (A.W.): to come with a reasonable proposal of how to
fit a stripped Globus and the NorduGrid package into 5 RPMs
- UI (M.E.)
1) docmentation (flowchart) of brokering
2) actual implementation
- Application runtime environment: A.W. to send around the ATLAS
software details
- Cooperation with EDG: sacrifice a machine per cluster to commit
EDG/Testbed1 exercises
- Remote conferencing: A.W., A.K. to study the issue of a VRVS
virtual room (reflector)
- Slashgrid: A.W. to set up an example at NBI