PBS Configuration Instructions for the NorduGrid Sites
Introduction
PBS is a very powerful Local Resource Manager System (batch system)
with dozens of configurable options. Server, queue and node attributes
can be used to configure the cluster's behaviour.
In order to correctly interface PBS to the NorduGrid architecture
(mainly the information provider scripts) there are a couple of
configuration requirements asked to be implemented by the local
system administrator.
Required configuration
- The computing nodes MUST be declared as cluster nodes
(job-exclusive), at the moment time-shared nodes
are not supported by the NorduGrid setup. If you intend to run
more than one job on a single processor then you can use the
virtual processor feature of PBS.
- For each queue, you MUST set one of the
max_user_run or max_running attributes and its
value SHOULD BE IN AGREEMENT with the number of available
resources (i.e. don't set the max_running = 10 if you have only
six (virtual) processors in your system). If you set both
max_running and max_user_run then obviously
max_user_run has to be less equal than max_running.
- For the time being, do NOT set server limits like
max_running, please use queue-based limits intead.
- Avoid using the max_load and the ideal_load directives. The nodes's mom
config file
(<PBS home on the
node>/mom_priv/config) should not
contain any max_load or ideal_load directives. PBS closes down a node
(no jobs are allocated to it) when the load on the node reaches
the max_load value. The max_load value is
meant for controlling time-shared nodes. In case of
job-exclusive nodes there is no need for setting these directives, moreover incorrectly set values can close down your node.
Optional configuration, hints
- If possible, please use queue-based attributes instead of server
level ones (for the time being, do not use server level
attributes at all).
- You may use the "acl_user_enable = True" with
"acl_users = user1,user2" attribute to
enable user access control for the queue.
- It is advisory to set the max_queuable attribute in
order to avoid a painfully long dead queue.
- You can use node properties from the
<PBS home on the
server>/server_priv/nodes file together with the
resources_default.neednodes to assign a queue to a
certain type of node. The example nodes file shown below, together with the set
queue pc resources_default.neednodes = Athlon qmgr
setting, results in a PBS configuration where the queue
pc is assigned to the node with Athlon processor. By
default the jobs from the pc queue will execute on
Athlon nodes.
Checking your configuration
- The node definition can be checked by
<PBS installation path>/bin/pbsnodes -a
All the nodes MUST have ntype=cluster.
- The required queue attributes can be checked as:
<PBS installation path>/bin/qstat -f -Q queuename
There MUST be a max_user_run or a
max_running attribute listed with a REASONABLE
value.
Example configuration files
#<PBS home at the server>/server_priv/nodes file:
-------------------------------------------------------------
node1 np=1 Athlon single
node2 np=2 PIII dual
node3 np=2 PIII dual
node4 np=1 Athlon dual
#pbs.conf file
--------------------------------------------------------------
#Example PBS configuration with a short (default ) and long queue
#
#cut & save as pbs.conf then use qmgr < pbs.conf
#
#
# Set server attributes.
#
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 60
set server default_queue = short
set server query_other_jobs = true
set server node_pack=false
set server scheduling=true
set server resources_default.neednodes = 1
#
# Create and define queue short
#
create queue short
set queue short queue_type = Execution
set queue short enabled = True
set queue short started = True
set queue short Priority = 100
set queue short resources_max.cput = 02:00:00
set queue short resources_default.cput = 01:00:00
set queue short max_user_run = 3
set queue short max_running = 6
set queue short max_queuable = 100
#
# Create and define queue long
#
create queue long
set queue