Trev Harmon

  • Home
  • Business
  • HPC
  • Cloud
  • Big Data
You are here: Home / Cloud / Understanding Moab Scheduling: Part I

February 21, 2014 by Trev Harmon Leave a Comment (REPOST)
February 14, 2014 at Adaptive Computing (ORIGINAL)

Understanding Moab Scheduling: Part I

This entry is part 1 of 3 in the series Understanding Moab Scheduling

Understanding Moab Scheduling
  • Understanding Moab Scheduling: Part I
  • Understanding Moab Scheduling: Part II
  • Understanding Moab Scheduling: Part III

With MoabCon, Adaptive Computing’s yearly user conference, just around the corner, I thought I’d revisit the subject of a well-received talk I gave two years ago at the conference. This will be done in three parts covering the Moab scheduling cycle, the proper use of mdiag -S and finally a simple example of scheduling in action.

Moab Scheduling Cycle

Moab Scheduling Cycle

Hopefully, this will give you a better understanding of how Moab does scheduling and how the different policies affect its behavior.
 

The Scheduling Cycle

Moab’s main task is to schedule workload. This is its number one priority at all times. In order to do this, it goes through a series of steps called the scheduling cycle. While there are technically over ten steps in this process, we are going to simplify things by categorizing all of them into five main steps. Each of these will be discussed separately below.

The Polling Interval

Before we get into the steps, however, there is one other important topic to quickly cover. If you take a look in your moab.cfg file, you will likely find a line that looks like the following:

RMPollInterval          30

 
or…

RMPollInterval          5,30

 
This is the RMPollInterval, which many believe is used to specify how long a full cycle is supposed to take. This assumption is mostly correct, but let’s get into what it really means.

The RMPollInterval can be specified as either one or two numbers, both of which are a number of seconds. If only one number is specified, it represents the maximum amount of time Moab will wait before attempting to force a new iteration to start. This is important, as many people think this single number is setting a fixed time for the iteration. That is technically incorrect, though in a properly balanced system that is the behavior one would expect to see. The reason it is correct is there are several different things that can cause an iteration to start early. For example, if an administrator issues a mschedctl -r command, a new cycle will be forced to start regardless of the RMPollInterval setting.

In the case where there are two numbers specified, the second number is the same as in the case where only one number is specified. The first number, however, represents the minimum time allowed for a scheduling iteration. On some systems, the events discussed above that can cause a new iteration to start early can become a problem if they happen to often. In these cases, the system gets bound up constantly scheduling because it is always starting a new cycle, which results in it becoming unresponsive to user requests. By adding a minimum time to RMPollInterval, the administrator is telling Moab that a cycle must take at least that long, even if an event attempts to start a new one early.

Now that we’ve talked about the time of the cycle, let’s look at the individual stages and what happens during them.

Step 1. Update Information from Resource Managers

The very first thing Moab does during a scheduling cycle (also known as a scheduling iteration) is to attempt to get a coherent understanding of the world around it. The more accurate information it has, the better scheduling decision it can make. In order to do this, Moab contacts each of its resource managers. So important is this information, the calls to the resource managers are actually blocking. Moab will do nothing else until it contacts the resource managers or a predefined timeout is reached. It’s that important.

Step 2. Manage Workload

In this next step Moab decides which jobs to start, in what order and where they are placed. Essentially, this is the core part of scheduling.

Job Ordering

Jobs in Moab are ordered according to their priority. System administrators are able to determine which factors are taken into account when this priority is calculated. Additionally, different weights can be applied to these factors, which allows administrators to target specific factors as being more important than others.

This priority factors utilize a two-tier system, where all factors (sub-components) are grouped into categories (components). Weights can be applied to both tiers. The following is a table of the different components and sub-components available. See the documentation for specifics.

Components Sub-Components
Job Credentials User, Group, Account, QoS, Class
Fairshare Usage FSUser, FSGroup, FSAccount, FSQoS, FSClass, FSGUser, FSGGroup, FSGAccount, FSJPU, FSPPU, FSPSPU, WCAccuracy
Requested Job Resources     Node, Proc, Mem, Swap, Disk, PS, PE, Walltime
Current Service Levels QueueTime, XFactor, Bypass, StartCount, Deadline, SPViolation, UserPrio
Target Service Levels TargetQueueTime, TargetXFactor
Consumed Resources Consumed, Remaining, Percent, ExecutionTime
Job Attributes AttrAttr, AttrState, AttrGres

 
Each time through the scheduling cycle, Moab will use the values of the selected components, sub-components and weights to calculate a numeric priority value for each one of the jobs. The following is the mathematical function used:

Σ (component-weight) * (sub-component-weight) * (sub-component-value)

Moab then orders the jobs based on their calculated priority score from highest to lowest. This is the order it will then attempt to start the jobs. It starts at the top of the list and moves down starting each job. Once it gets to a job that can’t currently be started, a reservation is created for that job. Moab then switches to backfill mode and continues working its way through the list.

For jobs Moab decides to start, it then needs to decide where to place them.

Job Placement

Job placement is determined through a simple process of two filters and a sort. Basically, it does the following:

  1. Start with all nodes in the cluster.
  2. Filter 1 – Geometry Check: All nodes that cannot physically run the job are removed from consideration (e.g., too few cores or memory).
  3. Filter 2 – Policy Check: All nodes that cannot run the job because of policy are removed from consideration (e.g., reservations or other running jobs).
  4. Sort: Using the Node Allocation Policy, the nodes are sorted and the top results are chosen.
  5. Job sent to selected nodes.

This process is then repeated for each job that can be started.

Step 3. Refresh Reservations

Following the workload management, Moab takes a look at each reserveration in the system. They are updated as necessary.

Step 4. Update Statistics

Moab keeps some statistics internally, and these are updated at this point in the schedule.

Step 5. Handle User Requests

Finally, in the final step of the scheduling iteration, Moab waits for and handles user requests. These include blocking commandline programs and interactions with other external systems.

In the case where Moab does not have enough time to accomplish everything within the specified poll interval, it is this final step that will be sacrificed. In other words, Moab will ensure the first four steps are always completed, while this last one is optional. This has the effect of user commands timing out as a problem starts to develop. This is often noticed by users of the system, which gives the administrators a chance to look at and correct the problem before it starts to affect Moab’s number one priority: the scheduling of workload.

Conclusion

Hopefully this has been a useful overview of Moab’s scheduling iteration.

In a future blog post, we will continue to explore some of the ways one can use the scheduling cycle to understand what is happening with the scheduler and ways to diagnose problems.

Series NavigationUnderstanding Moab Scheduling: Part II >>

Filed Under: Cloud, HPC Tagged With: Cloud, components, credentials, geometry check, HPC, iteration, Moab, optimize, ordering, policy check, poll interval, requests, reservations, resource managers, scheduling, scheduling cycle, sort, statistics, sub-components, weight, workload

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

The Manifesto

Conscious Business Ethics

BusinessEthics300

Learn why it matters. Then sign it!

Get Updates!

Affiliate Disclosure

This site offers affiliate links to some online retailers such as Amazon.com in conjunction with hyper-linked books, movies, music and other such items. IF you click on these links and subsequently make a purchase, I will receive a small percentage of the transaction price

Twitter: trev_harmon

  • RT @Calvinn_Hobbes: 33 years ago today (November 18, 1985), one unique tiger was captured by a precocious boy using a tuna sandwich. That w… 12:21:10 PM December 18, 2018 ReplyRetweetFavorite
  • We have some exciting opportunities at @evernym in the #informationtechnology, #DevOps and #SaaS/#Cloud spaces. Hel… https://t.co/Yrn2PgKV8d 06:36:58 PM May 15, 2018 ReplyRetweetFavorite
  • A memorable end to a series of memorable moments in a memorable conference — Elevate Humanity through Business… https://t.co/WmuYuxYYXN 12:42:35 PM May 02, 2018 ReplyRetweetFavorite
  • “When you don’t have resources, someone else chooses your destiny for you.” — @jeromedlove at #CCAC2018 12:50:54 PM May 01, 2018 ReplyRetweetFavorite
  • RT @ConsciousCap: “You cannot be a Conscious Leader if you are unwell” - @DrEvaSelhub on stress and the effect on overall wellness. #CCAC20… 12:42:56 PM May 01, 2018 ReplyRetweetFavorite
  • RT @ConsciousCap: Making the world a better place through PB&Js, an idea born from a #ConsciousCapitalism conference by @jeffsinelli #CCAC2… 07:56:27 AM May 01, 2018 ReplyRetweetFavorite
  • “When you find what you believe, it changes everything.” — @jeffsinelli @ #CCAC2018 07:55:09 AM May 01, 2018 ReplyRetweetFavorite
  • “What happens inside us affects what happens between us.” — @michellebkinder @ #CCAC2018 06:53:11 AM May 01, 2018 ReplyRetweetFavorite
  • Michelle Kinder is talking about how until the world is as good for #children as it is for #business, business won’… https://t.co/FznhhhqZ7S 06:48:32 AM May 01, 2018 ReplyRetweetFavorite
@trev_harmon

Writer, software architect, educator, blogger, photographer, would-be designer, and a believer in the power of simplicity and human-based design.

Other Blogs

Trev Harmon can also be read at:

  • Dream.Learn.Discover
    Primary Author -- This blog is about seeing the good in the world. With all the bad, evil and destruction, there are many, many people who are creating good in their sphere of influence. Some of these spheres are large and some are small. There is a time allotted to each one of us. It is with this time some decide to do remarkable things, though they may not believe them to be remarkable at the time.
  • Adaptive Computing
    Contributor -- The world of high-performance, cloud and supercomputing is opening the way for many new and exciting discoveries. As we push our quest for knowledge forward, technology will play a key role in supplementing our ability to learn and discover.

Copyright © 2019 · Executive Pro Theme on Genesis Framework · WordPress · Log in