Present (in and out): Balázs, Aleksandr, Oxana, Marko, Andrei Z., Ilja, Mike, Nikolay, Juha, Christian, Frederik, Peter, Andrei I., Farid, Mattias (at the minutes) .
* 0. Misc: agenda, next meeting, publication, etc.
Agenda finalized:
- Wednesday: Release 0.6 issues
- Thursday: Data Management Issues
Next meeting:
Copenhagen June 23-24 (pending OK from local organizers)
Publications:
- ARC: Marko will be editor.
Draft May 30.
Final version June 6.
- DC2: Based on the CHEP paper.
Conclusions: Better data management needed
Better error handling needed
Oxana and Mattias will divide tasks.
Deadline June 13.
* 1. Action list follow-up from previous meetings.
Aleksandr: More meaningful error messages from the grid-manager. Some
advances, more work in progress.
Aleksandr: Document current error codes in grid-manager manual appendix.
Everyone: Links to courses, tutorials, application and student
projects should be sent to Oxana for publication on the web
page. Permanent action item.
Mattias: Single user interface configuration file. Probably only for
the new arclib based cli.
Balázs: Server configuration file clean-up.
Anders, Oxana: Download page reorganisation. Postponed until after new
build structure.
Anders: CVS structure / build structure. Main topic on next meeting in
Copenhagen.
Anders: migrate to globus 3.2.1. Now obsolete. New action item migrate
to globus 4.0.
Oxana: Cached files are read-only. Should be documented in manuals.
Still open.
Balázs: infosys provider for services. In progress.
All: Non-root service. In progress. Work started by Balázs.
Aleksandr, Leif: Separate authoration/authentication. Unixmap option
in gridftp (documented, but not in template yet).
Aleksandr: Zombie job handling. Still open.
Balázs, Mattias, Aleksandr: cputime/walltime. Almost finished.
Finalize and close.
Oxana: Document benchmarks in manuals. Finalize and close.
Balázs: Propagate grid-manager's internal limits propagate to infosys.
Aleksandr: Utility to clean broken jobs. Plan to implement as feature
in gm-jobs utility.
Katarina, Aleksandr: logger interface in CVS. Part of CVS
reorganization?
Aleksandr: Logger clean-up (garbage collecting). Done.
Balázs: Logger reliability. Postponed until after logger database
reorganization
Aleksandr: Server side error codes. See above.
Mattias: Client side error messages. Still open.
Andrei I.: logger database reorganisation. Later in the summer.
Mattias: ngrerun. Almost there...
Aleksandr, Marko: Fireman built by Marko. Testing continues.
Marko: Collect DQ Requirements form Miguel. Wants SE info from MDS.
Aleksandr: host/user cert for RLS reg from SSE. Not yet. Issues with
proxy lifetimes.
Oxana: I/O specification in XRSL. Still open. JSDL not possible
without extensions. Switch to JSDL with "standard extensions".
Balázs: Merge nordugridmap.conf with nordugrid.conf
Balázs: Demo infrastructure. After 0.6 release.
Everyone: Document how to set priorities between jobs on a cluster.
Too LRMS specific - probably better handled by LRMS documentation.
Marko: VOMS server tests. The client part works without modifications
if the server is compiled with the new VOMS. voms-based GACL still
not tested.
Andrei Z.: St Petersburg RLS server. Now running on a SuSE 9 machine.
* 2. Todo list for 0.6
Deadline for the 0.6 release is midsummer.
grid-manager / gridftpd
- non-root suid grid-manager.
- fireman client intergration.
userinterface
- single configuration file for the new arclib based cli commands.
objections from Oxana on this point.
jarclib:
- experimental, but include in distribution.
GUI:
- probably not ready for inclusion in distribution.
configuration:
- environment variables: Should not need more than globus and voms does
- add missing variables to the template
globus:
- Should be possible to compile w/o Replica Catalog libraries.
documentation:
- Anders: Release Notes
- Blázs: Feature List, Release roadmap
- Anders: Build Instruction, Dependencies (INSTALL file)
- Mattias/Balázs: Client/Server Install instructions
- Balázs: Configuration Instruction/Documentation
- Balázs/Aleksandr/Oxana/Mattias: Main Technical Manuals (original
author should maintain)
- Anders: INSTALL, README, LICENCE
- Everyone: ChangeLog
- Anders: error code documentation
testing:
- need more 0.5.x server for testing
* 3. Data Management
Use cases based on ATLAS DC experience (Oxana):
input file / output file staging should be automatic:
- Grid3: All files in one place.
- LCG: If input file in job description - all jobs to data.
wrapper script that does data management - lots of assuptions on
capabilities on worker nodes.
- ARC: The best solution. Grid-manager takes care of staging in and out
data. Caching mechanism. The only bad thing is the handling of
storage elements for output files (no checks for full SEs etc, no
soft registering).
file copying:
- ngcopy is a nice tool for single file copy and registration (but
same limitations as for grid-manager above).
- example: copy 4000 files at site A or site B where
A: normal gsiftp
B: no 3rd party, no single-thread, no modify time
tricky with the present tools.
- requirements:
meta storage
dataset aware gridtools
batch data movement service (should be possible to initiate from
laptop and close down)
access control (gacl) aware tools
synchronize the gacls between replicas
SE should be data management object (i.e. allow ls, cp, rm etc.)
list SEs by VO, country, size, ACL
list files on the SE
move files from SE1 to SE1 (or from SE1 to a GRID)
Selected comments from the following discussion:
- Why more than one grid?
- DQ might be a good tool if further developed.
- gacl is a road block not a hole
Presentation of SSE (Aleksandr)
- gridftp is bad, because ftp is bad
- hence use http. It supports multiple streams, chunks and has
standard secure channel https (or use globus httpg)
- uses host certificate for registration
- would like to have fine-grained delegation in proxy
Peter: Should be easy to write an SRM layer on the server side.
Presentation of gLite (Peter)
- storage elements - use what exists
- storage resource manager (SRM) - uniform interface to various mass
storage technologies
- access protocols - gsiftp, https, rfio, ...
- catalogs - fireman file/replica/authorization/metadata
gLite standalone metadata catalog
supports unix like namespace (directories)
- posix I/O (gLite I/O based on alien I/O) through dedicated server
but need file on local cluster
- file transfer service / file placement service
- data scheduler planned for release 2.
- user interface: glite-get glite-put, glite-rm (on LFN or GUID)
glite-catalog-* commands (ls, create, rename, ..)
glite-transfer-* commands (submit, status, cancel, ..)
- APIs: glite-io (C), fireman (C, C++, Java, Perl),
file placement (C, C++, Java, Perl)
- POOL File Catalog API (glite catalog implementaion)
- Catalogs store "basic permissions" and ACL
- grid-only access model vs. mixed local and grid model
Selected comments from the following discussion:
- Namespaces in the file catalog:
Can I ask "what SEs service a specific namespace"?
- Is fireman VO specific? Answer: yes.
- Which file do you get when you belong to more than one VO?
- WSDL for fireman from EGEE webpage
- symbolic links are allowed but not hard links
- symbolic links are only allowed for files, not for namespaces (directories)
- file placement service transfers files using server certificates
* 4. Swiss feature requests (Frederik)
- retries for uploads
- downloads can be bottleneck since handled by the frontend only
- prioritization between jobs at stage-in
- queuing jobs can prevent pending jobs to start even though they are
handled by a different LRMS queue.
- get rid if the shared session directories (depends on server, does
not scale, scp)
- better support for non-PBS schedulers
- grid-manager scalability problem for > 1000 jobs
- grid-manager as non-root does not work
- job submission can fail if many jobs submits too quickly
- support for queues with fast response
- logging service allowing track CPU usage by VOs
- well documented and easy to use file catalogue
* 5. Runtime Environments (Juha)
Would be nice if people could follow the recommendations.
Not easy to enforce.
RTEs must be documented to be useful.
No clear overview of what application belongs where in the namespace.
Placed according to the wishes of the maintainers.
Can the monitor distinguish between registered and not registered RTEs?
Test RTEs must not match production RTEs
Problems with parallel environments since they have many version
numbers (verison of MPI implementation, version of compiler, version
of implementation)
Some RTE are defined for things that would fit better in the
information system, like e.g. LOCALDISK, TESTSITE.
* 6. Logger service
Andrey will redesign the database. Asking for usecases. Do we want to do
queries on xrsl?
XRSL attributes that should be queried should be duplicated in database.
Change to new version of usage record.
* 7. Renaming stuff
ngcopy -> ngcp
ngremove -> ngrm
ngrequest -> ngtransfer
keep MDS namespace nordugrid and ng* cli names
change RPM packages to arc-...
* 8. Future Features
- deamonizable ngsub for automatic retrieval.
- make gmlog file uploadable (should be possible to specify the gmlog
directory in outputfiles argument).
* 9. Middleware lookaround
EDG/LCG: data management compatible with arc through gsiftp RLS and DQ
gLite: We will use fireman. gsiftp protocol in common for data transfers
fireman server to be set up in Oslo.
SRM basic funcionality to be implemented in SSE. Client SRM already supported
in 0.5.x
globus 4 - what do we want/need from the new functionalities?
- CASS - similar to VOMS but not widely deployed - stick to VOMS for now
- web service containers - maybe
- myproxy - maybe
- RFT - reliable file transfer service - maybe
- new gridftp server implementation
globus-ftp-control API might be dropped by globus in the future