ATLAS Rome Workshop Production on NorduGrid

Links ::  Instructions for sites ::  SE configuration ::  ATLAS software ::  ATLAS kit for RH7.3

ATLAS resources Currently running jobs Today's failures Today's finished jobs

 Links

 Check list for participating sites

  1. Ensure that your site has the proper Grid middleware installed and registers to the proper indexing service
  2. Install the necessary ATLAS software and authorise production managers
  3. Optional: set up a Storage Element

 Detailed instructions

Coordinators of the ATLAS Rome Production in the NorduGrid space are Farid Ould-Saada, and Mattias Ellert, . For all the sites participating ATLAS Rome Production via the NorduGrid/ARC infrastructure, the following steps have to be performed:

 Grid-wise preparation

  1. Install latest CERN CA keys, either from their official page or from the NorduGrid's download area
  2. Consider upgrading to NorduGrid ARC 0.4.4. This is the best release so far. It is strongly recommended to upgrade your Globus installation to version 2.4.3-16ng (released on December 6, 2004), as it includes very important bug fixes.
  3. Register your site to the ATLAS GIIS. To do this, add the following block in globus.conf: [mds/gris/registration/GrisToAtlas]
    regname="Atlas"
    reghn=atlasgiis.nbi.dk
    regperiod=30
    servicename=nordugrid-cluster-name
    When done, send an e-mail containing your site's host name to .
  4. Join the dedicated mailing list by sending the message with the body text "subscribe atlas-ng-arc" to . NB! Make sure your "From" and "Return-Path" fields are the same as the e-mail address you use, otherwise you will not be able to post to the list (CERN's new policy).

 Instructions for ATLAS release installation

  1. Software installation. Install ATLAS software releases 9.0.3 and 9.0.4. The official CERN release of the Pacman kit is available for RedHat7.3, but it is reported to be working on other systems as well (provided the necessary libraries and compilers are in place). ATLAS provides instructions on how to get and install a release of the Atlas Software on RedHat7.3 via the "Pacman kit"; please consult also the digested instructions by Thomas Kittelmann and Jørgen Beck Hansen, which also are useful for non-RedHat7.3 systems . For RHEL and FC1, RPMs are available, as well as tarballs for Debian. They are located, for 9.0.3 and 9.0.4 respectively, at ftp://ftp.nordugrid.org/applications/hep/atlas/9.0.3/
    ftp://ftp.nordugrid.org/applications/hep/atlas/9.0.4
    Source RPMs (SRPMs), including latest patches for Fedora, are available from the same location. If you have a different system, you can do one of the following: try the RH7.3 one (might work), rebuild the reelease from SRPMs, or request to get your system supported by writing to Information on package name, version and approximate order of installation to install from SRPMs is in the file dep_list.txt For all the packages, there is an installation script with the usual name installatlasrpms-<dist>.sh (for Debian, use "sarge" for <dist>). To install the packages for your Linux distribution, fetch a corresponding script, define a set of necessary environment variables and execute it: wget ftp://ftp.nordugrid.org/applications/hep/atlas/<release>/installatlasrpms-<dist>.sh
    export ATLAS_ROOT=<where_you_want_to_have_atlas>
    export G4INSTALL=<where_you_want_to_have_geant4>
    export ROOTSYS=<where_you_want_to_have_root>
    export CERN=<where_you_want_to_have_cernlib>
    chmod u+x installatlasrpms-<dist>.sh
    ./installatlasrpms-<dist>.sh
  2. Check outbound connectivity. ATLAS jobs have to contact external databases, thus an outbound connectivity from the worker nodes must be enabled. For firewalled environments, several ports have to be opened, e.g., port 3306 for MySQL. Port 10521 is reported to be needed for something else, and others use to turn out eventually.
  3. Local validation. If you installed the "Pacman kit", you can validate the ATLAS software installation locally by using the standard ATLAS KitValidation tool. Please note that this will not check the entire functionality, but will only test whether the installation was successful. For details, follow instructions in the Kit description.

    If you installed a NorduGrid distribution of ATLAS s/w, a script TEST-ATLAS-<release> was produced automatically, in the location where you executed installatlasrpms-<dist>.sh. It is needed to set correctly ATLAS runtime environment. The installation then can be validated locally by running the corresponding validation script kitval9.sh, available at:

    http://grid.uio.no/atlas/validation/kitval9.sh To validate the installation locally, do: cd <some_place_with_some_disk_space>
    wget http://grid.uio.no/atlas/validation/kitval9.sh
    chmod u+x kitval9.sh
    ./kitval9.sh <path_to_the_script>TEST-ATLAS-<release>
    or cd <some_place_with_some_disk_space>
    wget http://grid.uio.no/atlas/validation/kitval9.sh
    chmod u+x kitval9.sh
    source <path_to_the_script>TEST-ATLAS-<release>
    ./kitval9.sh -r <release>
    The validation runs with some default arguments for KitValidation which leaves the outputs in place – the location is told when the script is exited. Each run of KitValidation opens a new sub-directory with a partially random name.
  4. Publish the release tag. Following a successful local validation, copy the script TEST-ATLAS-<release> into your runtimeenvironment directory, into the subdirectory APPS/HEP: cp <path_to_the_script>TEST-ATLAS-<release> <path_to_the_rte_dir>APPS/HEP/TEST-ATLAS-<release> If you installed the release from the "Pacman kit", you will have to create such script by hand. An example is described in the Step 6 of the Instruction (rename setup-9.0.3.sh to TEST-ATLAS-9.0.3).
  5. Validate the release Grid-wise. Fetch the vaildation job definition and submit the validation job to your cluster: wget http://grid.uio.no/atlas/validation/kitval<release>_rpm_TEST.xrsl
    ngsub -f kitval<release>_rpm_TEST.xrsl -c <your_cluster>
    Once the job has finished, please retrieve the results with ngget and check that there are no errors in the logfile KitValidation.log. If so, change the runtime environment script name from TEST-ATLAS-<release> to ATLAS-<release> and your cluster is ready. If the validation is passed, rename the script TEST-ATLAS-<release> to ATLAS-<RELEASE>.
  6. Note on cluster settings. Most jobs require large amounts of memory, thus sites advertising less than 800 MB of RAM are unlikely to get jobs. Siteadmins are encouraged to check/update the site node-memory specifications in condor (if any) and nordugrid.conf.
  7. Authorize the NorduGrid production managers (Mattias Ellert, Alex Read, Katarina Pajchel, Samir Ferrag and Rasmus Mackeprang): /O=Grid/O=NorduGrid/OU=tsl.uu.se/CN=Mattias Ellert
    /O=Grid/O=NorduGrid/OU=fys.uio.no/CN=Alex Read
    /O=Grid/O=NorduGrid/OU=fys.uio.no/CN=Katarina Pajchel
    /O=Grid/O=NorduGrid/OU=uio.no/CN=Samir Ferrag
    who are also the members of the SWEGRID's ATLAS VO: https://www.pdc.kth.se/grid/swegrid-vo/vo.atlas-testusers-vo Make sure you have the public keys of the NorduGrid CA installed.

 Setting up a Storage Element

The important thing when committing a Storage Element (SE) for ATLAS usage is to authorise on the SE level the ATLAS Virtual Organisation members, which implies accepting their respecitve Certificate Authorities (CA). Below are the instructions on how to achieve it, and other related information.

Instructions for authorizing ATLAS physicists to SE's

  1. Add the following line to /etc/grid-security/nordugridmap.conf (this line is the ATLAS VO server contact string): group "ldap://grid-vo.nikhef.nl/ou=lcg1,o=atlas,dc=eu-datagrid,dc=org" The next time the nordugridmap utility is run, the grid-mapfile /etc/grid-security/grid-mapfile is filled with the DN's of the members of the ATLAS VO. nordugridmap by default makes use of the /etc/grid-security/nordugridmap.conf file which can be overwritten on the command line with the -c switch. The name and location of the generated mapfile (default is /etc/grid-security/grid-mapfile) can be modified in the configuraton file, which might be useful for generating different ATLAS grid-mapfiles (see below).
  2. Make sure that write-access is provided for the members of the SWEGRID's dedicated ATLAS VO: https://www.pdc.kth.se/grid/swegrid-vo/vo.atlas-testusers-vo
  3. Install all the necessary CA-public-certificates. Those include all those accredited by the European Policy Management Authority for Grid Authentication in e-Science (also called EUGridPMA), http://eugridpma.org.
    The certificates can be downloaded from http://eugridpma.org/distribution/current/accredited/ All these CA's are recognized by LCG as well, see http://lcg-registrar.cern.ch/pki_certificates.html
  4. Configure the fileplugin SE to contain a read-only location through which data can be downloaded. Note: for the stable release-series 0.4.x, it is not possible to configure the SE so that some people have read- and other people write-access to the SE unless one uses the low-level configuration-file gridftpd.conf. In fact, the people that have read-access through the read-only location defined below will also write-access through the ordinary write-location that are used by the NorduGrid DC2 production managers. It is nevertheless recommended to make a read-only location to prevent accidents. The above restriction is removed in the development series 0.5.
    Below two examples are given: configuring the SE using nordugrid.conf, and configuring the SE using gridftpd.conf.
    • To configure a read-only location in the SE using nordugrid.conf, add the block [gridftpd/dc2_read] to nordugrid.conf with the following content: plugin=fileplugin.so
      path=/dc2_read
      mount="<your physical filedir with dc2 files>"
      dir="/ nouser read cd dirlist"
      This gives read-access to people through the path: gsiftp://<clustername>/dc2_read
    • Using the low-level gridftpd.conf configuration file (usually placed in /opt/nordugrid/etc) for defining a read-only path in the SE is also easy and gives a real opportunity to distinguish between people having read- and people having write-access. There is a small problem though: the gridftpd.conf configuration file is overwritten with the information from nordugrid.conf if one uses the standard method of starting the gridftp-server "service gridftpd start".
      Instead one should start the gridftpd using the command: /opt/nordugrid/sbin/gridftpd -c /opt/nordugrid/etc/gridftpd.conf This may need adding /opt/voms/lib to LD_LIBRARY_PATH first.
      With this in mind, the following is a standard gridftpd.conf configuration file:
      pidfile /var/run/gridftpd.pid
      logfile /var/log/gridftpd.log
      port 2811
      pluginpath /opt/nordugrid/lib
      encryption yes
      allowunknown no
      
      group atlas
        file /etc/grid-security/atlas-mapfile
      end
      
      group atlas_read
        file /etc/grid-security/atlasreaders-mapfile
      end
      
      groupcfg atlas
      plugin /dc2 fileplugin.so
        mount /
      		  dir / nouser read cd dirlist delete create *:* 664:664 mkdir *:* 775:775
      		  end
      
      		  groupcfg atlas_read
      		  plugin /dc2_read fileplugin.so
      		  mount /
      		    dir / nouser read cd dirlist
      		    end
      	      
      In this case, the people in /etc/grid-security/atlasreaders-mapfile will have read-access to the files and the people in /etc/grid-security/atlas-mapfile will have write-access. The file /etc/grid-security/atlas-mapfile should be filled with (at least) the DN's of the ATLAS DC production managers while the file /etc/grid-security/atlasreaders-mapfile should be filled with the DN's of the ATLAS VO people.

SE service requirements

Storage Elements are expected to serve data on request over an extended period of several monthes, typically - around one year. There are known cases when users requested data stored 2 years ago

If you plan to pemanently shut down a SE, please notify the coordinators and take the necessary steps to rescue the stored data, by replicating them to another SE, erasing old records from the indexing database, and eventually creating backups.

It is generally a good practice to have regular backups of the stored data, whenever possible.

The list of SE's that can accept production data at the moment can always be obtained by the query: globus-rls-cli query lrc lfn __storage_service__ rls://gridsrv3.nbi.dk If a SE is not in this list, it means it will not accept new data - but it still can serve such, if any are stored. If your SE goes down and/or is down for maintenace for a short while, please let the coordinator know beforehand, so that this list can be adjusted. This is also to make sure that enough space is available at all times.