What is Moab?
On this page:
Introduction
Moab is an advanced job scheduler for use on clusters and
supercomputers. It is a highly optimized and configurable tool
capable of supporting a large array of scheduling and fairness
policies, dynamic priorities, and extensive reservations.
Acknowledged by many as one of the most advanced schedulers available,
Moab is currently in use at hundreds of leading government,
academic, and commercial sites throughout the world. Moab improves the
manageability and efficiency of machines ranging from clusters of a
few processors to multi-teraflop supercomputers.
Moab at IU
On the Quarry system at Indiana University, Moab serves as
the job scheduler for the TORQUE resource manager. TORQUE is based on
OpenPBS; if you are familiar with PBS Pro, you'll find much of the
syntax the same.
Once a job has been submitted to one of the TORQUE queues, it
may become eligible for dispatch by Moab. The following
commands provide useful information on the status of a queued or running
job:
showq |
Display active, idle, or all jobs |
showstart jobid |
Display estimated dispatch time
for jobid
|
checkjob jobid |
Display attributes for
jobid
|
For more information about these commands as well as other Moab utilities, see the
Moab
Workload Manager User's Manual.
Fairshare scheduling
Fairshare scheduling allows historical resource usage to affect job
priority decisions. Administrators can set target utilization goals
for each user, group, class, or service group. When these utilization
goals are exceeded by one usage class, jobs from other usage classes
will take precedent over jobs from the offending class.
Currently, the fairshare policy on Quarry records usage over the last
seven days and decays at a rate of 80% per day. Each usage class
(usually a username) has a goal of 20% usage. Anything above that will
cause that user's jobs to have a lower scheduling priority.
Use the diagnose -f command to display the fairshare
scheduling usage table. The following example shows that users
baikgrp and dsheen have exceeded their "fair
share" and will be given lower priorities over the next week:
[root@Quarry]# diagnose -f
FairShare Information
Depth: 7 intervals Interval Length: 1:00:00:00 Decay Rate: 0.80
FS Policy: DEDICATEDPS
System FS Settings: Target Usage: 0.00 Flags: 0
FSInterval % Target 0 1 2 3 4 5 6
FSWeight ------- ------- 1.0000 0.8000 0.6400 0.5120 0.4096 0.3277 0.2621
TotalUsage 100.00 ------- 1872.2 1605.8 631.7 1868.0 3222.6 1857.5 1439.1
USER
-------------
haiyang* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
baikgrp* 45.91 20.00 81.11 45.57 79.98 49.70 4.88 20.49 10.79
balin* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
akewalra* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
kevidale* 0.25 20.00 ------- ------- ------- 0.23 0.74 0.78 -------
dlauer* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
kmane* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
bramley* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
qzou* 0.18 20.00 ------- ------- ------- 0.05 0.34 0.53 1.01
mathess* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
iyengar* 0.54 20.00 ------- ------- ------- ------- 0.63 2.58 3.34
pewang* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
rrepasky* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
agopu* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
heap* 0.02 20.00 0.09 ------- ------- ------- ------- ------- -------
vsingan* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
huili* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
dsheen* 39.26 20.00 14.97 43.03 ------- 33.74 86.48 62.68 -------
turnerg* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
ejolson* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
ssrivast* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
smiddha* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
mburland* 4.90 20.00 0.17 5.37 4.83 11.14 3.11 5.19 16.89
febertra* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
lsandvos* 0.01 20.00 ------- 0.06 ------- ------- ------- ------- -------
mswat* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
acolubri* 5.72 20.00 3.67 5.97 15.20 5.14 3.75 7.75 10.01
mbaik* 3.22 20.00 ------- ------- ------- ------- 0.07 ------- 57.96
When to expect your job to start
Moab uses the fairshare tables to determine which job will be assigned
to the next open processors. The showq command shows the
state of submitted jobs. Following is sample output:
[root@Quarry]# showq
active jobs--------------------
JOBID USERNAME STATE PROCS REMAINING STARTTIME
17199 heap Running 1 2:53:12 Wed Sep 17 11:20:45
17200 heap Running 1 2:53:52 Wed Sep 17 11:21:25
17201 heap Running 1 2:54:32 Wed Sep 17 11:22:05
17202 heap Running 1 2:55:13 Wed Sep 17 11:22:46
17203 heap Running 1 2:55:53 Wed Sep 17 11:23:26
17204 heap Running 1 2:56:33 Wed Sep 17 11:24:06
17205 heap Running 1 2:57:13 Wed Sep 17 11:24:46
.
.
.
6 active jobs
eligible jobs----------------------
JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME
16672 ejolson Idle 1 8:08:00:00 Tue Sep 16 23:27:05
16673 ejolson Idle 1 8:08:00:00 Tue Sep 16 23:27:06
16674 ejolson Idle 1 16:16:00:00 Tue Sep 16 23:27:06
16675 ejolson Idle 1 16:16:00:00 Tue Sep 16 23:27:06
16676 ejolson Idle 1 8:08:00:00 Tue Sep 16 23:27:06
16677 ejolson Idle 1 8:08:00:00 Tue Sep 16 23:27:06
6 eligible jobs
blocked jobs----------------
JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME
0 blocked jobs
Total Jobs: 116 Active Jobs: 104 Eligible Jobs: 6 Blocked Jobs: 0
The jobs at the top of the eligible jobs list will run next if resources are available. Various reservations can prevent jobs from running if
they have blocked off resources that waiting jobs would need. You can
use the command showres to examine the list of
reservations.
To find the estimated start time of a particular job, try:
showstart $JOBID
[root@Quarry]# showstart 16672
job 16672 requires 1 proc for 8:08:00:00
Earliest start in 5:03:54:32 on Mon Sep 22 17:00:00
Earliest completion in 13:11:54:32 on Wed Oct 1 01:00:00
Best Partition: DEFAULT
This is document avmu in domain all.
Last modified on August 27, 2007.