NorduGrid technical meeting

17-19 August 2005, Oslo

Minutes

Participants: Jukka, Patrick, Martin, Sigve, Vladimir, Irina, Andrei I., Andrei Z., Konstantin, Katarina, Niels, Arto, Juha, Kalle, Ferenc, Peter, Anders, Balázs, Csaba, Oxana, Aleksandr, Mattias (both at the minutes)

Wednesday 17 August

1. LHC@home

Jukka Klem:

Public resource computing, http://lchathome.cern.ch/. Based on BOINC

Executables downloaded PKI signed to ensure security.

Application example: tracking particles in the LHC beampipe (sixtrack). Used for accelerator design.

Motivate users... screen savers, leader boards, a.s.o.

New application clients (geant4 simulation, Higgs analysis).

Many options can be set in user preferences.

Check pointing to allow rebooting - application dependent.

2. Logger

Andrey Ivanov:

Logger performance is not sufficient in many use cases - why?

Suggested improvements:

Parallelisation (different strategies):

To do:

3. Reviewing task list

GUI - Ilja Livenson:

Taverna - graphical workflow management. Writing operations to support ng commands looks like possible task. Balázs says that is part of application for FP6.

Some problems with jarclib

To do: Define taverna components for nordugrid services.

Other

Backend/interface to SGE - ask Juha if finished. Close task on web page if done.

Interface to Condor: Done. There is a problem with big log being parsed very slowly. Problem needs a a fix. Mattias says sometimes running jobs can't be killed - it was decided to investigate such job next time it is noticed.

RTE - Installation on demand - how it could be done? etc... Will be in FP6 application. Discussion did not lead to anything useful. It is necessary to find somebody to make some review about people requirements about software installation and advertisements.

Re-scheduling, re-submission, recovery: Done, has to be removed from the web page.

Cross-cluster parallelism: Does not really work anyway.

Benchmarks - urge cluster administrators to publish more benchmarks. Instructions and suggested benchmarks must be published. Ask people from computing centers for advice.

Suggestion to print power of grid in benchmarks units on monitor page :) Not really serious.

Mattias to think if it would be possible to parallelize submission even more.

Automatic compression of data - better encoding - somebody from Finland is working in similar direction. But no more information.

Support for interactive tasks: somebody to put together strict and sane requirements. It was agreed that at least minimal level of dual-way interactivity is needed in realtime - stdin/stdout/stderr through FIFO like in UNICORE. Solution must be discussed on mailing list before implementing.To Do: Write specification and use cases.

Automatic registr. of cached files - task is done, to remove from web page.

Support for clasters without NFS - task is done.

MyProxy - accelerate work on support for MyProxy. Discussion leads to solution that we need it. To be added to xRSL like credentialserver=myproxy://hostname:port.

Make session directory writable through GRidFTP interface during execution of job through option in xRSL - one more step toward ineractive jobs. Information in InfoSys about type of session directory must be added. Suggested possible name and values:
session_directory_type = shared,private,read_only,read_write.

A.K. to update Logger UR and announce that on mailing list.

Thursday 18 August, Friday 19 August

Reports from NIIF

Ferenc Szalai:

Distributed storage for LHC Service Challenge:

Anders Wäänänen:

Other

Balazs asks Mattias please read infosystem document.

Obsolete attributes in xRSL were detected: lrmstype, savestate - should be removed from documentation.

A.K to add appendix to one of documents about GACL syntax and allowed elements.

ARC is accepting JSDL - so far in minimal way. Currently supported on GM side. A.K. to write converter classes (to be used by userinterface). ARC extensions to JSDL discussed and mostly accepted. More elements to be added to extensions:
join
session_directory_type - SessionType
RTE version - needs relation type

rerun counter - maybe current default is too low. Make clear message to user if job runs out of rerun tries.