PBS Configuration Instructions for the NorduGrid Sites

Introduction

PBS is a very powerful Local Resource Manager System (batch system) with dozens of configurable options. Server, queue and node attributes can be used to configure the cluster's behaviour.

In order to correctly interface PBS to the NorduGrid architecture (mainly the information provider scripts) there are a couple of configuration requirements asked to be implemented by the local system administrator.

Required configuration

  1. The computing nodes MUST be declared as cluster nodes (job-exclusive), at the moment time-shared nodes are not supported by the NorduGrid setup. If you intend to run more than one job on a single processor then you can use the virtual processor feature of PBS.
  2. For each queue, you MUST set one of the max_user_run or max_running attributes and its value SHOULD BE IN AGREEMENT with the number of available resources (i.e. don't set the max_running = 10 if you have only six (virtual) processors in your system). If you set both max_running and max_user_run then obviously max_user_run has to be less equal than max_running.
  3. For the time being, do NOT set server limits like max_running, please use queue-based limits intead.
  4. Avoid using the max_load and the ideal_load directives. The nodes's mom config file
    (<PBS home on the node>/mom_priv/config) should not contain any max_load or ideal_load directives. PBS closes down a node (no jobs are allocated to it) when the load on the node reaches the max_load value. The max_load value is meant for controlling time-shared nodes. In case of job-exclusive nodes there is no need for setting these directives, moreover incorrectly set values can close down your node.

Optional configuration, hints

  1. If possible, please use queue-based attributes instead of server level ones (for the time being, do not use server level attributes at all).
  2. You may use the "acl_user_enable = True" with "acl_users = user1,user2" attribute to enable user access control for the queue.
  3. It is advisory to set the max_queuable attribute in order to avoid a painfully long dead queue.
  4. You can use node properties from the <PBS home on the server>/server_priv/nodes file together with the resources_default.neednodes to assign a queue to a certain type of node. The example nodes file shown below, together with the set queue pc resources_default.neednodes = Athlon qmgr setting, results in a PBS configuration where the queue pc is assigned to the node with Athlon processor. By default the jobs from the pc queue will execute on Athlon nodes.

Checking your configuration

  1. The node definition can be checked by

    <PBS installation path>/bin/pbsnodes -a

    All the nodes MUST have ntype=cluster.
  2. The required queue attributes can be checked as:

    <PBS installation path>/bin/qstat -f -Q queuename

    There MUST be a max_user_run or a max_running attribute listed with a REASONABLE value.

Example configuration files


#<PBS home at the server>/server_priv/nodes  file:
-------------------------------------------------------------
node1      np=1    Athlon single
node2      np=2    PIII   dual
node3      np=2    PIII   dual
node4      np=1    Athlon dual




#pbs.conf file
--------------------------------------------------------------
#Example PBS configuration with a short (default ) and long queue
#
#cut & save as pbs.conf  then use qmgr < pbs.conf
#

#
# Set server attributes.
#

set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 60
set server default_queue = short
set server query_other_jobs = true
set server node_pack=false
set server scheduling=true
set server resources_default.neednodes = 1

#
# Create and define queue short
#

create queue short
set queue short queue_type = Execution
set queue short enabled = True
set queue short started = True
set queue short Priority = 100
set queue short resources_max.cput = 02:00:00
set queue short resources_default.cput = 01:00:00
set queue short max_user_run = 3
set queue short max_running = 6
set queue short max_queuable = 100

#
# Create and define queue long
#

create queue long
set queue