NorduGrid technical meeting

18-19 October 2004, Linköping

Minutes

Security meeting, October 18


Leif opened the meeting and everyone presented themselves

* Grid Security presentation by Olle
Olle Mulmo gave a presentation of Grid Security over phone
The presentation covered:
  Requirements for grid security
  How it worked (CA, PKI, proxy)
  EGEE/gLite security
  How it looks today - good overview of what is wrong:
    Rights in proxys,
    software running as root
  Only luck that massive grid hacking has not yet happened

* NSC intrusion
Leif talked about the intrusion at NSC
Was compromised three times in a row
  Local user account was compromised and rootkit installed
  Installed ssh client with password sniffer
Other incidents was also covered
It was discussed how to protct sites from this

* M-grid / grid security policies
Urpo Kalia gave a presentation of security for M-grid
(www.csc.fi/proj/mgrid)
Presented a list of issues that must be managed for grid to be
'secure'
M-grid will have a security policy - there is a draft
  Presents how to various should be done, and various procedures
  Also has checklists (easy to use)

There was an open discussion about security in grid.
There is a need for people knowing "real world" security
People need to work together on this
There is a need for policies and HOWTOs
Local site security is difficult, jobs accessing other jobs
Lots of security issues where discussed
Security can be hard to scale up sometimes
How should policies/rules be decided, by who and how
International issues, some jobs might be legal in some contries,
  but not everywhere
Storing of data about persons is problematic - privacy laws
How should certificates be revoked
What should be done if a resource owner discovers a 'nasty' user.

Technical meeting, October 19

Present: Balázs, Anders, Jakob, Ilja, Andi, Aleksandr, Mattias, Leif, Åke, Oxana, Niels, Juha, Mike, Arto, Henrik (at the minutes).


Balazs sugested an agenda based on the minutes from the last meeting

Agenda:

* Catchup with the last meeting
* Runtime presentation by Juha
* ARCLib status
* LCG2 Software presentation
* Data Management

Something had to be presented at Europaen Grid Conference about ARC,
  but it was not quite clear who could/should do it.

* Data Management
Biggest thing missing
Not clear what requirements are
  Use cases and requirements should be found
Only one present who was interested in doing some Data Management was
Aleksander
Matthias Wadenstein (wasnt present) is still interested, but from on
admin point of view
It is not clear who should manage storage elements
Aleksander suggested some people coming together with an architecture
and \
 functionality of a data management system


== Items from the last meeting ==

* Logger
Has still not been tested.
The student in Lund who should have looked at it, had not.
The task is open for everyone.

* Logger cleanup
Not complete yet, works on the clientside.
Installing the logger is too complex.
  Aleksander has made installing instructions.
  MySQL is not setup automatically.

* Logging web interface
Has been cleaned up, and should be ready soon.
It requires a graph drawing library, Anders will make an RPM for it.
Currently the logger needs to read a file from.
  $NORDUGRID_LOCATION/etc/nglogger-conf to read login information.

The naming about various logging 'stuff' is confusing.

* Logging API/RPC interface
Jakob had promised to post something about the logging interface, but
  had been to busy to get anything done.
Jakob promised to post schema and comments soon.
The interfacte will be GGF compliant.
The schema is finished, but implementation doesnt exist yet.
SWEGRID might be interested in the logging framework.
SWEGRID logging might use SGAS, not really clear how it will
  collect information though.
Anders suggested that the two efforts should work together
GGF schema is not very good; unclear how it should work and what
  semantics the values should have.

* Log rotation
Aleksander have made an example in nordugrid.conf which illustrates
  how to use log rotation.
Anders has not made log rotation work with the system
GM should not make its own rotation, but use the one in the system.
Currently the GM just opens that file and leaves it open.
  Makes it hard to move the file.
Reopening the file is hard in a multi-threaded systems.
Can be done with copy-truncate, but might cause data loss.
Leif uses it, but it is not ideal.

Henrik said GM crashed when log file reached 2 GB.
GM doesn't do anything special about logging, just open the file and
  appends, perhaps it should be opened in 64 bit mode, but it should
  work without.

Threads can write to the log file simultaneously which causes log
messages to mixed (stream multiplexing).
A lock should be held when writing to the log file.

* Default GM log file
The question about whether a default logging file should be created
  was raised.
Having admins setup their software is usually good, but software
  should also be easy to setup.
Default should be made, but should still be configurable.

* Broken files in session catalog
What should be done about these.
Had not been investigated/done.

* GM Scan period
Not quite done yet.
Something new in GM which i did not hear, Aleks was not sure whether
it worked fully and documentation was not there yet.

* Resource backends
Fork backend had been done and is documented - should be good.
Condor works, but configuration is messy.
SGE is nasty, is very site specific, some updates has been made.
  Integration with information system still needs some work.
  The Fins promised to do something.
Backends should be stable and finished at 0.6

* More verbose gm-jobs
Implementation should be done, but isnt used yet.
Does not display the new jobs - should it?
# I lost it here
Will be possible to specify who can submit jobs and who can retrieve
 files from session catalog - to avoid zombie jobs.

* Benchmarking integration.
Infosystem is ready.
Matthias has checked in the code for it, but needs testing.
  Implementation can use frequency if they dont provide benchmark.
Also needs documentation, deployment is also missing.
Balazs will solve some 10% extra problem.

* Seperating user authorization from user mapping
Allows for dynamic user mapping.
Leif should provide something, but hasnt, not sure was is expected
from him.
DNs should only by used for authorization and logging (i.e., not
mapping)

* Running ARC as non root
Not clear what is needed.
TCP wrappers for deamons is hard to make right.
A list of what is needed should be made. 
  Info system needs to read files from the job control dir.
  Hostkeys needs to be owned by the user running the daemons.
Henrik will try to setup francis by by running a non-grid user,
  and make a checklist.
Breaks changing user for jobs.
SWEGRID does some mapping, but not in any consistent way.
Lots of problems comes up, e.g., reading session directory

* Plug-in templates
Downloader check plug-in (Aleks) - should be working.
Test site plugin, checking that TESTSITE is specified (Leif)
  Has been send to Anders, not checked in yet.
Not clear how they should be configured, should be set in the
 configuration file.

* Testsites
None has been setup yet.
Should they be added to be monitor - makes it slower.
  Usually testsites arent the slow ones.
Only allow jobs which has the TESTSITE runtime environment.
  Will do this by the previous mentioned plug-in
Should somehow specify what it is testing.
Need plug-in before testsites can go online.
  Will be in 0.5.15

* Specifying url options in xRSL
E.g., number of threads, hashes/sums, read/write.
A proper way for specifying this is needed.
Currently is works by having a seperator in the URL.
Matthias suggested a third inputfile parameter called option.
  Wont solve all problems, some are specific to index servers.
  Not all options have names - need naming scheme.
  We can have multiple destination and sources.
We need a proper proposal for how it should be done.
# Head started to hurt here

* Client library API specification
Henrik had promised to do this, but had forgot it.
Will probably be in ARCLib.

* Globus 3.2
Appears to be working.
Some have it working, but it needs more testing.
There where issues with GridFTP - should have been solved now.

* Growing GRIS server.
Grows with both authenticated and anonymous quires.
Grows much more than the number of jobs.
Hard to debug, Ake has some ideas.
Globus uses some of our patches, parhaps Anders has taken it out
  Unlikely though.
It is likely that there is a connection between the number of jobs,
  and the speed of the memory leek.
Balasz will investigate it.
Still grows even if it not quried, Ake made one on a special port,
  without the Globus stuff -> It is a slapd bug!

%Post meeting notes:
Did not grow if not queried, Ake made a slight mistake
Problem is in ldif backend, Ake has a patch for it.

# Went away for five minutes

* Globus
New Globus will be modular, ARC will depend on 25 Globus packages
Would be nice to drop some Globus dependencies, .e.g., openssl/ldap
  Problem with old distros using old versions of openssl (Redhat 7.3)
  Globus OpenSSL can be removed when 7.3 is no longer supported.
OpenLDAP in Globus is bad, since it is very old.
  Not trivial to replace.

* CVS -> Subversion
No one really knew about whether it is worth moving 
Balazs suggested keeping CVS since we know what it is.

* CA directory cleanup
Anders has started this, moved CAs to seperate package
Some certificates can be remomved from cvs.
We might have a policy directory as well.

* NorduGrid -> ARC transition
Lots of places in code this should be changed.
ARC LDAP namespace is not taken, attributes will be changed.
We will have backwards compability, but old cli tools will not be able
  to use the new clusters -> change should be fast.

* Info system startup script
Needs to be improved
Balazs will make this soon - before 0.6

* Single client configuration file
Matthias hasnt started.
Anders has made a configuration class, which can be a basis.

* Webpage
Everyone should send links, projects, etc. to webmaster (Oxana)

# LUNCH

* Runtime Environment Registry
Juha presented the concept and its webpage.
There are some questions regarding:
  Hiarchial namespaces.
  Naming and versioning scheme.
  Maintaining lists, how to do it.
  How to enforce it.
  Should there be a database or service for it.
  How to allocate and use namespaces.
People generally want this to happen now.
Delegating namespaces
Hiarchial structure is not clear (/ORG/CERN/ATLAS >< /APPL/HEP/ATLAS)
How to make resource owners change REs?
  Spam 'em and have the old for a month
Perhaps only add the new one, under the new namespaces and keep the old

* LCG Runtime Environment
Oxana presented an LCG runtime environement document.
There was a list of requirements for REs made by the LHC (perhaps LCG).
ARC was pretty close to them, except that REs should be over services.
Unkown if LCG will implement the requirements (it is just a list).

The had installed LCG2 software on Ingvar in Linkoping.
There where quite a few problems.
Stil in testing, not really stable or anything yet.
Installation was messy.
There is a lack of clear/consise documentation.
ARC is _much_ simpler to install (and other things) than LCG.

# Coffee

Henrik gave a summary about the status of ARCLib
There was a discussion about using cURL together with Globus
  This would probably require compiling cURL against globus
Anders presented the build system.
There was a lot of discussion about #ifdefs and limitation to distros
  and software versions.
Some tests where presented.
Aleksander wanted timestamps and rotating in notify
  Can be done by altering outstream.
He also wanted parallel ldap queries, ARCLib will have it.
Anders presented config which can read from different backends
Aleksander wanted section support and perhaps support for order.

* CVS Structure:
People discussed this and things where moved around.
Result was quite good.

Data Management:
# Was away for IBM Bluegene presentation.
# Heard this though:
We need proxys with access rights.