Shift Scheduling

Another challenge from Decision Management Community. Here is the task, I have introduced rule numbers (Rxx), and unique initials for each doctor.

  • In a hospital there has to be a doctor present at all times. (R01)
  • Planning is made for the next seven days, starting on Monday. (R02)
  • Each day consists of three shifts: (R03)
    1. Early shift
    2. Late shift
    3. Night shift
  • Every shift has to be assigned to a doctor. (R04)
  • In total, there are five doctors. (R05)
  • Every doctor has a list of available days and some special requirements, see table below.
  • The following rules apply:
    1. A doctor can only work one shift per day. (R06)
    2. A doctor should always be available for his shift (see table). (R07)
    3. If a doctor has a night shift, they either get the next day off, or the night shift again. (R08)
    4. A doctor either works both days of the weekend, or none of the days (R09).
Name Initials Available
Fleming AF (R10) Friday, Saturday, Sunday.
Freud SF (R11) Every day early or late, never night.
Heimlich HH (R12) Every day, but never the night shift on weekends.
Eustachi BE (R13) Every day, every shift.
Golgi CG (R14) Every day, every shift. (R15) Max 2 night shifts.

A plan should be made in which every requirement is fulfilled.

Summary

It was fun to work on this one. The problem boils down to constraint programming over finite domains. It is a bit counter-intuitive, but there are over ten million valid solutions. However, considering that — given 21 shifts and five people — the maximum search space (without any rules) is 5^21 ~ 4.768E14, not very surprising.
I am running this example on an average home machine with 16GB memory. The whole DB size on disk is ~ 2GB, so it should run in-memory even on a machine with 8GB memory.

Answer

The code, for PostgreSQL, contains sections with titles matching this post; it should be easy to follow. I would suggest to open it now and read along.

Domains

Eight domains, I have added shift and day numbers to allow for simple ordering; doctors’ initials to save on screen space in reports.

-- Days
DY    = day_name {'MON', ... 'SUN'}
DYNO  = day_no   {1, ... 7}

-- Daily Shifts
SFT = shift_name {'Early', 'Late', 'Night'}
SNO = shift_no   {1, 2, 3}

-- Weekly shifts
WSFT = week_shift_name {MON-1', 'MON-2', ... 'SUN-3'}
WSNO = week_shift_no   {1, ... 21}

-- Doctors
DOC_NAME = person_name  {'Fleming', ... Golgi}
DOC      = doc_initials {'AF', 'SF', 'HH', 'BE', 'CG'}

Predicates and Relations

-- [p 1]
-- Day of week DY number DYNO exists.
--
day_wk {DY, DYNO}
   KEY {DY}
   KEY {DYNO}

-- data sample
  (DY, DYNO)
------------
  ('SUN', 7)
, ('MON', 1)
, ('TUE', 2)
, ('WED', 3)
, ('THU', 4)
, ('FRI', 5)
, ('SAT', 6)
-- [p 2]
-- Shift SFT with number SNO exists.
--
shift {SFT, SNO}
  KEY {SFT}
  KEY {SNO}


-- data sample
  (SFT, SNO)
--------------
  ('Early', 1)
, ('Late',  2)
, ('Night', 3)
-- [p 3]
-- Week shift WSFT, with week shift number WSNO,
-- of day DY shift SFT exists.
--
week_shift {WSFT, WSNO, DY, SFT}
       KEY {WSFT}
       KEY {WSNO}
       KEY {DY, SFT}

    FK1 {DY}   REFERENCES day_wk
    FK2 {SFT}  REFERENCES shift


-- data sample
  (WSFT, WSNO, DY, SFT)
-------------------------------
  ('MON-1',  1, 'MON', 'Early')
, ('MON-2',  2, 'MON', 'Late' )
, ...
, ('SUN-3', 21, 'SUN', 'Night')
-- [p 4]
-- Doctor DOC_NAME with initials DOC
-- works for the hospital.
--
doctor {DOC_NAME, DOC}
   KEY {DOC}
   KEY {DOC_NAME}


-- data sample
  (DOC_NAME, DOC)
------------------
  ('Fleming',  'AF')
, ('Freud',    'SF')
, ('Heimlich', 'HH')
, ('Eustachi', 'BE')
, ('Golgi',    'CG')

Doctor availability, covers rules (R10) ... (R14).

-- [p 5]
-- Doctor DOC is available for week shift WSFT.
--
doctor_available {DOC, WSFT}
             KEY {DOC, WSFT}

FK1 {DOC}  REFERENCES doctor
FK2 {WSFT} REFERENCES week_shift


-- data sample
  (DOC, WSFT)
---------------
  ('AF', FRI-1)
, ('AF', FRI-2)
, ('AF', FRI-3)
, ...
, ('CG', SUN-3)

The schedule, one possible solution per row. Each column contains initials of a doctor scheduled for that shift.

-- [p 6]
-- According to the schedule (plan) number PLN:
-- doctor "MON-1" is scheduled for Monday  early shift,
-- doctor "MON-2" is scheduled for Monday  late shift,
-- doctor "MON-3" is scheduled for Monday  night shift,
-- doctor "TUE-1" is scheduled for Tuesday early shift,
-- ...
-- doctor "SUN-2" is scheduled for Sunday  late shift.
-- doctor "SUN-3" is scheduled for Sunday  night shift.
--
schedule {  PLN
         , "MON-1", "MON-2", "MON-3"
         , "TUE-1", "TUE-2", "TUE-3"
         , "WED-1", "WED-2", "WED-3"
         , "THU-1", "THU-2", "THU-3"
         , "FRI-1", "FRI-2", "FRI-3"
         , "SAT-1", "SAT-2", "SAT-3"
         , "SUN-1", "SUN-2", "SUN-3"
         }

     KEY {PLN}
     KEY {"MON-1", ... "SUN-3"}

Implemented as a materialized view. Take a look at the code, find "[p 6]". The view code creates a search space by joining doctor_available — already covering (R10) ... (R14) — for each shift of the week. At the same time enforces rules: (R06), (R08), (R09), (R15). The rule (R07) is implied in using doctor_available in joins.

How to Choose a Solution

There are 10,135,296 possible solutions. It is interesting to note that there is no rule that says that each doctor should work, so some solutions do not use every doctor. It is easy to see that all rules could be satisfied by only three doctors working: Eustachi, Heimlich, and Golgi. How to choose a solution? Which criteria to use? In general, a function that takes two solutions a and b and returns true if solution a is better than b is required.

better_than(a,b) -> Boolean

Instead of the function, we can create summary statistics for each solution and then explore the stats to choose. For example:

  • Min number of shifts assigned to a doctor.
  • Max number of shifts assigned to a doctor.
  • Number of doctors required for that schedule.
  • For each doctor, number of shifts assigned to that doctor.
-- [p 7]
-- According to the schedule (plan) number PLN:
-- min number of shifts assigned to a doctor is MNS;
-- max number of shifts assigned to a doctor is MXS;
-- number of doctors required for that plan  is DRQ;
-- doctor Fleming  is assigned AF shifts;
-- doctor Freud    is assigned SF shifts;
-- doctor Heimlich is assigned HH shifts;
-- doctor Eustachi is assigned BE shifts;
-- doctor Golgi    is assigned CG shifts.
--
--
stats {PLN, MNS, MXS, DRQ, AF, SF, HH, BE, CG}
  KEY {PLN}

   FK {PLN} REFERENCES schedule {PLN}

Testing

To allow for quick testing of solutions, a batch of tests is packaged in a view named test_rules.

-- [p 8]
-- There are VIOLATIONS of rule RULE.
--
test_rules {RULE, VIOLATIONS}
       KEY {RULE}

Before continuing, run these two statements manually, they are commented-out at the bottom of the code. This populates both materialized views with data. It may take ~15 minutes to finish, the stats view is slow to generate. Once schedule (plans) and stats are generated, it is easy to test and explore. Good time for a coffee break.

REFRESH MATERIALIZED VIEW schedule;
REFRESH MATERIALIZED VIEW stats;

Once done, run the tests; should finish in about 20 seconds.

SELECT *
FROM test_rules
ORDER BY "VIOLATIONS", "RULE";

+------------+----------------------------------+
| VIOLATIONS | RULE                             |
+------------+----------------------------------+
| 0          | (R04) Every shift has to be ...  |
| 0          | (R06) A doctor can only work one |
| 0          | (R08) Night shift rule.          |
| 0          | (R09) Weekend rule.              |
| 0          | (R10) Fleming only Friday; ...   |
| 0          | (R11) Freud every day early or ..|
| 0          | (R12) Heimlich every day; never .|
| 0          | (R15) Golgi max 2 night shifts.  |
+------------+----------------------------------+

Exploring Solutions

How many rows (solutions) do we have? Check both, schedule and stats views.

WITH
q_1 AS (
  SELECT count(pln) as "SCHEDULE"
  FROM schedule
),
q_2 AS (
  SELECT count(pln) as "STATS"
  FROM stats
)
SELECT *
FROM q_1
JOIN q_2 ON true ;

+----------+----------+
| SCHEDULE | STATS    |
+----------+----------+
| 10135296 | 10135296 |
+----------+----------+

How many plans are available, per number of required doctors for that plan?

SELECT DRQ       AS doctors_required
    , count(pln) AS "plans"
FROM stats
GROUP BY DRQ ;

+------------------+---------+
| doctors_required | plans   |
+------------------+---------+
| 3                |     384 |
| 4                |  627712 |
| 5                | 9507200 |
+------------------+---------+

Show five plans where all five doctors are required, each one works minimum three, maximum five shifts, and Dr. Freud works four shifts. See the result.

SELECT b.*
FROM stats    AS a
JOIN schedule AS b USING (pln)
WHERE drq = 5 -- doctors required
  AND mns = 3 -- min number of shifts per doc
  AND mxs = 5 -- max number of shifts per doc
  AND  sf = 4 -- Freud works 4 shifts
LIMIT 5 ;

How many plans are available where all five doctors are required, and each doctor works minimum four shifts?

SELECT count(pln) as cnt
FROM stats
WHERE drq = 5 -- doctors required
  AND mns = 4 -- min number of shifts per doc
;

+-----+
| cnt |
+-----+
|  0  |
+-----+

Well, this is interesting. It means that doctor hours have to be balanced over more than one week. The fun continues.

Balancing Workload

Can we find a solution such that over a period of N weeks each doctor works the same number of shifts? Actually, no. There are 21 shifts per week, five doctors, so in average, without any special rules, a doctor should work 21/5 = 4.2 shifts per week. However, Dr. Fleming can work only 3 shifts per week. Even if Fleming works all three shifts, others have to work in average (21-3)/4 = 4.5 shifts per week, so he can never catch up with other doctors.
So, the solution is to balance workload of other doctors, based on Fleming’s availability.

If Fleming works AF shifts per week,
other doctors have to work AVG shifts in average,
which takes (minimum) of WKS weeks to balance.

 AF  AVG   WKS
-----------------
 1   5      1
 2   4.75   4
 3   4.5    2

If Fleming works only one shift a week, any plan where other doctors work 5 shifts per week will do. This plan can also be static, no need to change it; all plans here satisfy (R08) rule.

SELECT count(pln) as cnt
FROM stats
WHERE af  = 1
  AND sf  = 5
  AND hh  = 5
  AND be  = 5
  AND cg  = 5 ;

+-------+
|  cnt  |
+-------+
| 53120 |
+-------+

Plenty of plans for this case.

If he works three shifts, then any solution which leads to a total of nine shifts, over two weeks, for other doctors will be fine — as long as it satisfies the night shift rule R(08). I would argue that this two-week plan has to be static too.

The simplest way to balance workload and get the desired average is to alternate between four and five shifts per week for other doctors: (4+5)/2 = 4.5. Let’s consider a pair of plans (A,B) that would satisfy this. How many can be found?

WITH
w1 AS (
  SELECT count(pln) as cnt_A
  FROM stats
  WHERE af  = 3
    AND sf  = 4
    AND hh  = 4
    AND be  = 5
    AND cg  = 5
),
w2 AS (
  SELECT count(pln) as cnt_B
  FROM stats
  WHERE af  = 3
    AND sf  = 5
    AND hh  = 5
    AND be  = 4
    AND cg  = 4
)
SELECT cnt_A, cnt_B
FROM w1
JOIN w2 on true ;

+--------+--------+
| cnt_A  | cnt_B  |
+--------+--------+
| 145152 | 127872 |
+--------+--------+

Not symmetrical, but the total search space is a product of these two numbers, so ~18.56E9 combinations. Any of these that satisfies (R08) would qualify.
That’s too many bytes for my home computer, will not fit into memory. Is there a way to reason some of this search space away? Given that all existing plans satisfy (R08) it may be possible to search for plan pairs that start and end the same. If the first two shifts of Monday, and the last Sunday shift are the same, the pair (A,B) satisfies (R08).

A(MON-1, MON-2, SUN-3) = B(MON-1, MON-2, SUN-3)

Does it work? Yes, there are 1,122,295,808 pairs like this, hence simplifying the join condition, Take a look at the part two of the code and the result example.

And what about the case where Fleming works two shifts, and it takes four weeks to balance the workload? I think I need a bigger computer.

Map Coloring

Another fun challenge from Decision Management Community. A canonical example of the constraint programming (over finite domains) paradigm leads to a simple and elegant SQL solution.

The task is to color a map of six European countries: Belgium, Denmark, France, Germany, Luxembourg, and the Netherlands in such a way that no neighboring countries use the same color. You need to use no more than 4 colors: blue, red, green, or yellow.

Starting with two sets: country and color; who borders whom, and the rule.

country {BE, DK, FR, DE, LU, NL}
color   {B, R, G, Y}

BE borders {FR, LU, DE, NL}
DK borders {DE}
FR borders {BE, LU, DE}
DE borders {FR, LU, BE, NL, DK}
LU borders {FR, DE, BE}
NL borders {BE, DE}

Neighboring countries can not use the same color.

Reasoning

There are six countries, each one can have one of four colors, so in total there is a maximum of 4^6 = 4096 combinations to examine. The strategy is simple:

  1. Create a set of all possible combinations.
  2. Apply constraints.
  3. Present all results that satisfy given constraints.

The SQL code is straightforward and easy to follow. Try it, there are 144 possible solutions.

Variation

In a variation of the problem rules are changed a bit. Only three colors are allowed and some neighboring countries are allowed to have the same color, but there is a penalty cost assigned. The task is to find solutions with minimum penalty cost.

country {BE, DK, FR, DE, LU, NL}
color   {R, G, B}

These neighboring countries can use the same color,
with assigned cost:

LU-FR  $257
LU-DE  $904
LU-BE  $568

Reasoning v2

This time there are 3^6 = 729 combinations to examine. The strategy:

  1. Create a set of all possible combinations.
  2. Apply mandatory constraints.
  3. Calculate the cost for all tuples satisfying mandatory constraints.
  4. Find the minimum cost.
  5. Present all results that have the minimum cost.

Again, SQL code is easy to follow, this time there are 12 possible solutions.

Monkey Business

A challenge by Decision Management Community.

A zoo has four monkeys, during the monkey lunchtime each one ate a different fruit in their favourite resting place. Sam, who doesn’t like bananas, likes sitting on the grass. The monkey who sat on the rock ate the apple. The monkey who ate the pear didn’t sit on the tree branch. Anna sat by the stream but she didn’t eat the pear. Harriet didn’t sit on the tree branch. Mike doesn’t like oranges.

  1. What kind of fruit each monkey ate?
  2. Where their favourite resting place was?

So, how to solve this in SQL? It is easy to identify domains, predicates and rules; after that it is straightforward. Continue reading “Monkey Business”

Murder Mystery

Another fun puzzle, based on the problem No. 55 in [PFJ86] and published as a challenge by Decision Management Community.

Someone in Dreadsbury Mansion killed aunt Agatha. Agatha, the butler, and Charles live in Dreadsbury Mansion, and are the only ones to live there. A killer always hates, and is no richer than his victim. Charles hates no one that Agatha hates. Agatha hates everybody except the butler. The butler hates everyone not richer than aunt Agatha. The butler hates everyone whom Agatha hates. No one hates everyone. Who killed Agatha?

The idea — as in previous posts — is to use concept of predicates, constraints, relations, and Continue reading “Murder Mystery”