ATLAS Production on NorduGrid

Links ::  Instructions for sites ::  SE configuration ::  ATLAS software

ATLAS resources Currently running jobs Today's failures Today's finished jobs (ARC 0.4.x) Today's finished jobs (ARC 0.5.x)

 Links

 Check list for participating sites

  1. Ensure that your site has the proper Grid middleware installed and registers to the proper indexing service
  2. Install the necessary ATLAS software and authorise production managers
  3. Optional: set up a Storage Element

 Detailed instructions

If you have problems or questions regarding ATLAS Production in the NorduGrid space, write to , or contact personally Farid Ould-Saada, , Alex Read, or Oxana Smirnova, . For all the sites participating ATLAS Production via the NorduGrid/ARC infrastructure, the following steps have to be performed:

 Grid-wise preparation

  1. Install latest CERN CA keys, either from their official page or from the NorduGrid's download area
  2. Consider deploying NorduGrid ARC 0.4.5. This is the latest stable release so far. Don't use development tags 0.5.x unless you know what are you doing. It is strongly recommended to upgrade your Globus installation to version 2.4.3-16ng (released on December 6, 2004), as it includes very important bug fixes.
  3. Register your site to the ATLAS GIIS. To do this, add the following block to
    globus.conf if you use nordugrid v.0.4.5: [mds/gris/registration/GrisToAtlas]
    regname="Atlas"
    reghn=atlasgiis.nbi.dk
    regperiod=30
    servicename=nordugrid-cluster-name
    or this to arc.conf if you use nordugrid v.0.5.3x: [infosys/cluster/registration/GrisToAtlas]
    targethostname="atlasgiis.nbi.dk"
    targetport="2135"
    targetsuffix="mds-vo-name=Atlas,o=grid"
    regperiod="23"
    Then restart information system services:
    if you use nordugrid v.0.4.5: /etc/init.d/globus-mds restart if you use nordugrid v.0.5.x: /etc/init.d/grid-infosys restart When done, send an e-mail containing your front-end host name and request to be authorised to the ATLAS GIIS to .
  4. Join the dedicated mailing list "atlas-ng-arc" by using CERN's mailing list interface. NB! Make sure your "From" and "Return-Path" fields are the same as the e-mail address you use, otherwise you will not be able to post to the list.

 Instructions for ATLAS release installation

  1. Software installation. Always install the latest ATLAS software release available, currently 11.0.x series.
    There are two ways (at least) to install ATLAS software:
    • The official ATLAS distribution kit is available from Pacman repository as binary tarballs for Scientific Linux CERN v3 (SLC3). It is reported to be working on other systems as well (provided the 32 bit mode, necessary libraries and compilers are in place). ATLAS provides instructions on how to get and install such a release.
      One can also use the nice interactive installation script prepared for NorduGrid, which automatically produces the runtimeenvironment setup script (see below). The script is available for releases 11.0.0 and up from http://grid.uio.no/atlas
    • Some people prefer RPM distributions; such are prepared by a group of Nordic ATLAS researchers, with ATLAS approval. Below are the RPM installation instructions for ATLAS s/w release 11.0.0 at RHEL3; for availability of other releases and OS versions, check the following repositories: http://www.grid.tsl.uu.se/RTEs/ATLAS/
      http://grid.uio.no/atlas
      There are installation scripts, called either ATLAS-x.y.z-install-<opsys>.sh or installatlasrpms-<opsys>.sh. In case of doubts or problems, please contact the atlas-ng-arc mailing list. The repositories also contain source RPMs for eventual rebuild on a different platform; www.grid.tsl.uu.se/RTEs/ATLAS has source RPMs for external packages, too.
      The installation procedure is rather simple: export SITEROOT=<path_to_top_atlas_location>
      cd $SITEROOT
      wget http://grid.uio.no/atlas/11.0.0/installatlasrpms-RHEL3.sh
      chmod u+x installatlasrpms-RHEL3_dev.sh
      ./installatlasrpms-RHEL3_dev.sh -a -c
  2. Check outbound connectivity. ATLAS jobs have to contact external databases, thus an outbound connectivity from the worker nodes must be enabled. For firewalled environments, several ports have to be opened, e.g., port 3306 for MySQL. Port 10521 is reported to be needed for something else, and others use to turn out eventually.
  3. Local validation. If you installed the "Pacman kit", you can validate the ATLAS software installation locally by using the standard ATLAS KitValidation tool. Please note that this will not check the entire functionality, but will only test whether the installation was successful. Instructions to be provided.

    If you installed a NorduGrid distribution of ATLAS s/w, a script TEST-ATLAS-<release_nr> was produced automatically, in the location where you executed the installation script. It is needed to set correctly ATLAS runtime environment. The installation then can be validated locally by running the corresponding validation script kitval<release_nr>.sh, available from:

    http://grid.uio.no/atlas/validation To validate the installation locally, do: cd <some_place_with_some_disk_space>
    wget http://grid.uio.no/atlas/validation/kitval<release_nr>.sh
    chmod u+x kitval<release_nr>.sh
    ./kitval<release_nr>.sh <path_to_the_script>/TEST-ATLAS-<release_nr>
    or cd <some_place_with_some_disk_space>
    wget http://grid.uio.no/atlas/validation/kitval<release_nr>.sh
    chmod u+x kitval<release_nr>.sh
    source <path_to_the_script>/TEST-ATLAS-<release_nr>
    ./kitval<release_nr>.sh -r <release_nr>
    The validation runs with some default arguments for KitValidation which leaves the outputs in place – the location is told when the script is exited. Each run of KitValidation opens a new sub-directory with a partially random name. Check that there are no errors or "FAILED" outcomes. If such occur, seek help from the experts.
  4. Publish the release tag. Following a successful local validation, copy the script TEST-ATLAS-<release_nr> into your runtimeenvironment directory, into the subdirectory APPS/HEP: cp <path_to_the_script>/TEST-ATLAS-<release_nr> <rte_dir>/APPS/HEP/TEST-ATLAS-<release_nr> If you installed the release from the "Pacman kit", you will have to create such script by hand. An example of such script for release 11.0.0 is here
  5. Validate the release Grid-wise. Fetch the vaildation job definition and submit the validation job to your cluster, e.g.: wget http://grid.uio.no/atlas/validation/kitval11.0.0_TEST.xrsl
    ngsub -f kitval11.0.0_TEST.xrsl -c <your_cluster>
    NB! If you use client version 0.5.30 and higher, remove -f option from ngsub instruction above.

    Once the job has finished, please retrieve the results with ngget and check that there are no errors in the logfile KitValidation.log. If so, rename the runtime environment script to ATLAS-<release_nr> and your cluster is ready. If the validation is passed, rename the script TEST-ATLAS-<release_nr> to ATLAS-<release_nr>.
  6. Note on cluster settings. Most jobs require large amounts of memory, thus sites advertising less than 800 MB of RAM are unlikely to get jobs. Siteadmins are encouraged to check/update the site node-memory specifications in condor (if any) and nordugrid.conf.
  7. Authorize the NorduGrid production managers (Mattias Ellert, Alex Read, Katarina Pajchel, Samir Ferrag): /O=Grid/O=NorduGrid/OU=tsl.uu.se/CN=Mattias Ellert
    /O=Grid/O=NorduGrid/OU=fys.uio.no/CN=Alex Read
    /O=Grid/O=NorduGrid/OU=fys.uio.no/CN=Katarina Pajchel
    /O=Grid/O=NorduGrid/OU=uio.no/CN=Samir Ferrag
    who are also the members of the SWEGRID's ATLAS VO: https://www.pdc.kth.se/grid/swegrid-vo/vo.atlas-testusers-vo Make sure you have the public keys of the NorduGrid CA installed.

 Setting up a Storage Element

Registering Storage Element to ATLAS GIIS

  1. Add the following block to
    globus.conf if you use nordugrid v.0.4.5: [mds/gris/registration/SEtoAtlas]
    regname="Atlas"
    reghn=atlasgiis.nbi.dk
    regperiod=30
    rootdn="nordugrid-se-name=<mySE>:my.host.name,Mds-Vo-name=local,o=grid"
    Here <mySE> should be substituted with the same string as in the header of the storage element block: [se/<mySE>].
    If you use nordugrid v.0.5.3x, add instead this to arc.conf: [infosys/se/<mySE>/registration/toATLAS]
    targethostname="atlasgiis.nbi.dk"
    targetport="2135"
    targetsuffix="mds-vo-name=Atlas,o=grid"
    regperiod="44"
    Here <mySE> should be substituted with the same string as in the header of the information system storage element block: [infosys/se/<mySE>].
  2. Restart information system services:
    if you use nordugrid v.0.4.5: /etc/init.d/globus-mds restart if you use nordugrid v.0.5.x: /etc/init.d/grid-infosys restart
  3. Send an e-mail containing your SE host name and request to be authorised to the ATLAS GIIS to .

    Instructions for authorizing ATLAS physicists to SE's

    The important thing when committing a Storage Element (SE) for ATLAS usage is to authorise on the SE level the ATLAS Virtual Organisation members, which implies accepting their respecitve Certificate Authorities (CA). Below are the instructions on how to achieve it, and other related information.

    1. Add the following line to /etc/grid-security/nordugridmap.conf (this line is the ATLAS VO server contact string): group "ldap://grid-vo.nikhef.nl/ou=lcg1,o=atlas,dc=eu-datagrid,dc=org" The next time the nordugridmap utility is run, the grid-mapfile /etc/grid-security/grid-mapfile is filled with the DN's of the members of the ATLAS VO. nordugridmap by default makes use of the /etc/grid-security/nordugridmap.conf file which can be overwritten on the command line with the -c switch. The name and location of the generated mapfile (default is /etc/grid-security/grid-mapfile) can be modified in the configuraton file, which might be useful for generating different ATLAS grid-mapfiles (see below).
    2. Make sure that write-access is provided for the members of the SWEGRID's dedicated ATLAS VO (Nordic production managers): https://www.pdc.kth.se/grid/swegrid-vo/vo.atlas-testusers-vo
    3. Install all the necessary CA-public-certificates. Those include all those accredited by the European Policy Management Authority for Grid Authentication in e-Science (also called EUGridPMA), http://eugridpma.org.
      The certificates can be downloaded from http://eugridpma.org/distribution/current/accredited/ All these CA's are recognized by LCG as well, see http://lcg-registrar.cern.ch/pki_certificates.html
    4. Configure the fileplugin SE to contain a read-only location through which data can be downloaded. Note: for the stable release-series 0.4.x, it is not possible to configure the SE so that some people have read- and other people write-access to the SE unless one uses the low-level configuration-file gridftpd.conf. In fact, the people that have read-access through the read-only location defined below will also have write-access through the ordinary write-location that are used by the NorduGrid ATLAS production managers. It is nevertheless recommended to make a read-only location to prevent accidents. The above restriction is removed in the development series 0.5.
      Below two examples are given: configuring the SE using nordugrid.conf (nordugrid v.0.4.5) or arc.conf (nordugrid v.0.5.x), and configuring the SE using low-level gridftpd.conf.
      • To configure a read-only location in the SE using nordugrid.conf (nordugrid v.0.4.5) or arc.conf (nordugrid v.0.5.x), add the block [gridftpd/atlasprod_read] to nordugrid.conf (nordugrid v.0.4.5) or arc.conf (nordugrid v.0.5.x) with the following content: plugin=fileplugin.so
        path=/atlasprod_read
        mount="<your physical filedir with ATLAS files>"
        dir="/ nouser read cd dirlist"
        This gives read-access to people through the path: gsiftp://<clustername:port>/atlasprod_read
      • Using the low-level gridftpd.conf configuration file (usually placed in /opt/nordugrid/etc) for defining a read-only path in the SE is also easy and gives a real opportunity to distinguish between people having read- and people having write-access. There is a small problem though: the gridftpd.conf configuration file is overwritten with the information from nordugrid.conf (or arc.conf) if one uses the standard method of starting the gridftp-server "service gridftpd start".
        Instead one should start the gridftpd using the command: /opt/nordugrid/sbin/gridftpd -c /opt/nordugrid/etc/gridftpd.conf This may need adding /opt/voms/lib to LD_LIBRARY_PATH first.
        With this in mind, the following is a standard gridftpd.conf configuration file:
        pidfile /var/run/gridftpd.pid
        logfile /var/log/gridftpd.log
        port 2811
        pluginpath /opt/nordugrid/lib
        encryption yes
        allowunknown no
        
        group atlas
          file /etc/grid-security/atlas-mapfile
        end
        
        group atlas_read
          file /etc/grid-security/atlasreaders-mapfile
        end
        
        groupcfg atlas
        plugin /atlasprod fileplugin.so
          mount /
            dir / nouser read cd dirlist delete create *:* 664:664 mkdir *:* 775:775
            end
        
        groupcfg atlas_read
        plugin /atlasprod_read fileplugin.so
          mount /
            dir / nouser read cd dirlist
            end
        	      
        In this case, the people in /etc/grid-security/atlasreaders-mapfile will have read-access to the files and the people in /etc/grid-security/atlas-mapfile will have write-access. The file /etc/grid-security/atlas-mapfile should be filled with (at least) the DN's of the ATLAS DC production managers while the file /etc/grid-security/atlasreaders-mapfile should be filled with the DN's of the ATLAS VO people.

SE service requirements

Storage Elements are expected to serve data on request over an extended period of several monthes, typically - around one year. There are known cases when users requested data stored 2 years ago

If you plan to pemanently shut down a SE, please notify the coordinators and take the necessary steps to rescue the stored data, by replicating them to another SE, erasing old records from the indexing database, and eventually creating backups.

It is generally a good practice to have regular backups of the stored data, whenever possible.

The list of SE's that can accept production data at the moment can always be obtained by the query: globus-rls-cli query lrc lfn __storage_service__ rls://atlasrls.nordugrid.org:39281 If a SE is not in this list, it means it will not accept new data - but it still can serve such, if any are stored. If your SE goes down and/or is down for maintenace for a short while, please let the coordinator know beforehand, so that this list can be adjusted. This is also to make sure that enough space is available at all times.

Old production data (DC2) are stored in rls://atlasrls.nordugrid.org:39282