Release Announcement for ARC version 0.8.2.1
June 3, 2010
The 0.8.2.1 is an emergency release of the 0.8.2 major ARC
release that was created to provide a fix for the critical bug
#1944.
The Advanced Resource Connector as of version 0.8.2.1 is an
open source software solution that enables production quality
computational grids for high throughput computing,
encompassing a wide range of size and purpose. The middleware
integrates computing resources (typically, computing clusters
managed by a batch system) and, to a lesser extent, storage
services, making them available via an information system and
a common secure grid layer. The middleware builds upon
standard open source solutions such as OpenSSL, OpenLDAP and
libxml2, as well as some Globus Toolkit 5 pre-WS libraries. It
relies on well-tested technologies in creating unique
ARC-specific services and tools. ARC developers strive to
achieve simplicity, non-invasiveness, high performance,
stability and reliability. With release 0.8.2.1 come enhanced
scalability and improved performance of computing and
information services. ARC middleware is officialy supported on
major Linux flavors, and is known to operate smoothly on other
Linux systems, with a variety of batch job management
systems. Starting from release 0.8.2 several of the new
components are also available on Microsoft Windows and Mac OS
X platforms.
Changes in this emergency release (since ARC v0.8.2)
Bug fixes:
- Fix for the critical bug: In some configurations, files from DPM may not be
handled properly (bug #1944)
- Validation of checksums supplied with output files against remote storage (bug #1924)
- Utility directory for srms.conf should be created if it doesn't exist (bug #1948)
- Attribute nordugrid-job-executionnodes missing in PBS (bug #1949)
The release notes for ARC v0.8.2 were extended retroactively to better clarify the new additional experimental components.
Known issues as of ARC v0.8.2.1 release
- The 0.8.2 release introduced the BDII 5 as part of the
LDAP-based local information system. This version of BDII 5 generates an LDAP tree
which is not backward compatible with older clients and client libraries (0.8.1.1
and earlier).The clients in 0.8.2 were modified to correctly handle the
new non-backward compatible LDAP structure (bug 1947)
Proposed workaround: If you have already the BDII4 installed
during the update don't upgrade to BDII 5. By the default
installation you will not get BDII5 if you have already BDII4 deployed. If you deploy a new system and you want
to be backward compatible it is recommended first to
manually install the BDII 4 version.
- Some users have ran into problems with Python path not set
correctly on 64bit systems (bug 1946).
Proposed workaround: PYTHONPATH variable needs to include both
/opt/nordugrid/lib/python2.4/site-packages/ and
/opt/nordugrid/lib64/python2.4/site-packages/
depending on your Python version.
- The new GUI, arcjobtool does not work on Ubuntu.
- The new GUI, arcjobtool does not work on CentOS 5 due to problems with wxPython, a bug report exists in EPEL, and a new version of it is coming out shortly.
- The new GUI, arcjobtool requires you to install nordugrid-arc-nox-plugins-globus if you want to submit to ARC-CE resources.
Changes introduced in the ARC v0.8.2 release
Extensions/enhancements:
- Possibility to use BDII5 as part of the LDAP information system (not backwards compatible, see known issues)
- Updated packaging allows easier tailoring of services
- Ability to specify the lifetime of Grid Manager cache files
- SELinux profile created for index-service
- A new experimental GUI-client has been added called arcjobtool (see known issues below) .
New experimental components were introduced into the 0.8.2
release. The experimental components are not instaled unless
explicitely requested.
- arcjobtool - a new client GUI (see known issues).
- Chelonia - lightweight distributed storage with automatic file replication
- ISIS - peer-to-peer information index service
- Echo - testing service for the ARC Web Service container, the HED
- Charon - policy decision point service which evaluates policy requests
- Hopi - lightweight HTTP server, to be used e.g. with Chelonia
With the 0.8.2 release a major SVN cleanup
took place in the arc0 tree resulting in components being
removed from both the SVN and the distribution::
- Nordugrid ARC Compat (compatibility with ARC v0.4.x)
- Nordugrid ARC Logger
- Fireman support
- The old HTTPS server and the Smart SE
- rcutils
- AFS support of Grid Manager
- obsolete documentation
- NinFG (was never part of the distribution, removed from SVN)
Bug fixes already in ARC v0.8.2 release:
- With SLURM, ngkill'ed jobs get stuck in KILLING state forever (bug 1747)
- Deal with bad cache configuration better, allow cachedir=/
- Bringing conf file instructions up to date
- Allow LFC DataPoints only if LFC is supported
- Updated package dependencies
- Information system does not work with OpenSSL < 0.9.8 due to missing issuer_hash command.
- Increased POSIX compliance in service scripts
- Added timestamps to grid-infosys logfiles
- submit-sge-job does not handle parallel environments correctly when count=1 (bug 1771)
- Handle empty pid-files, solves issue with trying to stop infosys too soon, also doesn't run kill without a pid
- RTE stage 0 not working with Condor backend
- Redirect stdout of RTE stage 0 to job.*.errors file (instead of /dev/null). Helps with debugging RTEs
- Set default sessiondir when none is given
- Updated size limits for cache and number of elements possible to keep in a BDII to new EGEE recommended values
- Updated infoindex error message to show real file access
- Now handles mpi and scratch-dir properly
- Updated cluster.pl to handle cache information correctly
- Added check for missing cacert directory (bug in OpenLDAP on some RHEL4 installations)
- Updated nordugrid schema to support utf8 in many places
- Renamed cron-script (bug 1737)
- Job finished succesfully but no SGE accounting record found (bug 1790)
- Fix for not working linkpath configuration option (bug 1780)
- joboption_count is written to .diag as ExecutionUnits by submit-*-job. (bug 1695)
- Job start and end-times are written as LRMSStartTime and LRMSEndTime to .diag (bug 1752)
- Proper handling of maxwalltime "UNLIMITED/infinite" in SLURM backend
- Fix bug 1799, missing default values for memory requirements when missing in XRSL
- Fix problems simultaneously creating SRM directories
- Fix problems when submitting jobs on LRMSes where shared filesystem is not used (bug 1806)
- Default memory limits when not set in XRSL is now set to 1gb (previously it was undefined)
- Fix listing of FTP files by adding MLST and LIST to tried commands (bug 1666)
- Infosystem does not handle local queuing SGE arrayjobs correctly (bug 1732)
- Fix incorrect nordugrid-queue-totalcpus and nordugrid-queue-running in SGE 6.2
- grid-infosys can now be configured with chkconfig
- grid-manager does not properly close files if VOMS transfer-shares are enabled and it recieves jobs without voms-extensions. (bug 1849)
- grid-infosys sometimes generates a bad password for the LDAP server.
- nordugridmap: skip blank lines in HTTP sources and do proper counting
- urlogger: use fqdn from socket API for machine name if hostname is not available from config
- Automatic detection and caching of SRM port and protocol version (bug 1431)
- Retry httpg connection timeouts (bug 1805)
- Do not delete cache files with hard links (bug 1748)
- Allow replication within one site (bug 1031)
- Call fsync() before closing files
- Validation of local file size against remote source
- Choose control and session directories randomly (bug 1852)
- Improved stability of index-service with heavy load (bug 1900)
- Memory leaks fixed (bug 1902)
- Increased security for PBS, LL, LSF, SGE and SLURM regarding how diag-files are handled
- Default memory limit is now taken from nodememory in arc.conf (bug 1828)
- Do not try to change permissions on files you don't own on workernode in pbs
- LoadLeveler backend counted requested walltime in seconds instead of minutes
- LSF backend was parsing output from commands in a bad way
- PBS backend had walltime = cputime if unset, if running multicore jobs, this can result in walltime getting set very high and being rejected from resources.
- /var/log/grid-infosys has been renamed to /var/log/grid-infosys.log to conform to other ARC log-names
- Increased portability with fork backend
- Set more permissive umask in fork backend (bug 1865)
- Better support in client for multiprocessor jobs, now default cputime is set to walltime * number_of_nodes if it is unset.
- Fixed reporting of cputime in slurm, is now walltime * number_of_nodes
- Fixed a problem with LDAP updates of the Information System Index server using BDII4
- Fixed job exit code not being reported in infosys if it was 0
- Fixed flapping index server
To get ARC 0.8.2.1:
The source and binary packages are available from: http://download.nordugrid.org
The standalone client tarballs are available from: http://download.nordugrid.org
Instructions for setting up your machine to use the NorduGrid repository
are available in NorduGrid Wiki
More information:
Please consult release
notes of ARC 0.8 for detailed product description.
The dedicated release Wiki page contains detailed information about the release content,
build and installation.
Consult documentation section of the NorduGrid website