Grok Normalization

Which normal form (NF) is my table in? How do I normalize? How to decompose to 3NF? BCNF? How to understand the 5th normal form? Mess, confusion, chaos, conflicting advice, endless arguing on StackOverflow. Misleading advice, even in books.

In Short

Normalization is a process for removing and preventing logical errors in a relational database.

Most of the time this involves implementing proper constraints and reducing — eliminating when possible — redundancy by decomposition into projections.

Fun Example

This simple example is designed to cut through the fog, confusion, and noise generated by large number of conflicting "normalization" articles.

A group of ten J-named people: Jane, Jack, Jill, Joe, James, Janet, Jackie, Jacob, Joan, and Jim, decide to make a SQL DB about themselves. One of tables in the DB is named smoker, stating who smokes.

Case 1

smoker
+-------------+
| NME    (PK) |
+-------------+
| Jack        |
| James, Jill |
| Joan        |
+-------------+
  1. Comment on the current NF.
  2. Is it possible to fix and how?
  3. Comment on the NF after the fix.
  4. How to prevent the error for any future data changes?

Case 2

Same scenario, but this time the table is empty.

smoker {NME}
    PK {NME}
  • Consider the same four questions as in the previous case.

Case 3

Again the table is empty. This time is has a foreign key (FK) to a table named person, which lists all ten people in the group.

smoker {NME}
    PK {NME}
    FK {NME} REFERENCES person {NME}
  • The same four questions.

Notes

Is this all there is to it? No. However, understanding this will make you immune to misleading examples. Once you get this, it is easy to read about this-and-that NF examples, problems, and fixes.

If I had to recommend only one book, it would be a tossup between Hal08 and Dat05. Both require study. However, as a developer you do not need to obsess about NFs. Stick to a simple design method and your tables will be in a high NF as a natural consequence of the design. Use predicates and constraints, everything else will fall in place.

Grokking something is a very personal thing, so it may be best if do not provide answers. Well, try not to look.


All attributes (columns) NOT NULL

PK = Primary Key
FK = Foreign Key

SQL X-to-Y

What is the correct way to model one-to-one, one-to-many, many-to-many? Where do foreign keys go? What criteria to use? How to decide? I see these questions repeated on StackOverflow time and again.

The particular implementation should be based on logic. In most cases, verbalizing the constraint in natural language reveals the correct solution.

The following example uses two entities: thing T, and category C. Defining relationship with four most common multiplicities results in 4x4=16 possible constraints.

Term                    Multiplicity
------------------------------------
is     in exactly one       [1]
is     in at most one       [0,1]
is     in at least one      [1,*]
may be in more than one     [0,*]

Easy

The following 11 cases can be declaratively defined in SQL using standard constraints (FK, PK, AK). No need for triggers, deferred constraints, stored procedures, nor special application code.


T |-[1]------------[1]-| C

1. Each thing is in exactly one category;
for each category: exactly one thing is in that category.

thing {T, C}
   PK {T}
   AK {C}

T |-[0,1]----------[1]-| C

2. Each thing is in exactly one category;
for each category: at most one thing is in that category.

thing {T, C}
   PK {T}
   AK {C}

   FK {C} REFERENCES category {C}


category {C}
      PK {C}

T |-[1]----------[0,1]-| C

3. Each thing is in at most one category;
for each category: exactly one thing is in that category.

thing {T}
   PK {T}


category {C, T}
      PK {C}
      AK {T}

      FK {T} REFERENCES thing {T}

T |-[0,*]----------[1]-| C

4. Each thing is in exactly one category;
for each category: more than one thing may be in that category.

thing {T, C}
   PK {T}

   FK {C} REFERENCES category {C}


category {C}
      PK {C}

T |-[1]----------[0,*]-| C

5. Each thing may be in more than one category;
for each category: exactly one thing is in that category.

thing {T}
   PK {T}


category {C, T}
      PK {C}

      FK {T} REFERENCES thing {T}

T |-[0,*]--------[0,*]-| C

6. Each thing may be in more than one category;
for each category: more than one thing may be in that category.

thing {T}
   PK {T}


category {C}
      PK {C}


thing_category {T, C}
            PK {T, C}

           FK1 {T} REFERENCES thing    {T}
           FK2 {C} REFERENCES category {C}

T |-[0,*]--------[0,1]-| C

7. Each thing is in at most one category;
for each category: more than one thing may be in that category.

thing {T}
   PK {T}


category {C}
      PK {C}


thing_category {T, C}
            PK {T}

           FK1 {T} REFERENCES thing    {T}
           FK2 {C} REFERENCES category {C}

T |-[0,1]--------[0,*]-| C

8. Each thing may be in more than one category;
for each category: at most one thing is in that category.

thing {T}
   PK {T}


category {C}
      PK {C}


thing_category {T, C}
            PK {C}

           FK1 {T} REFERENCES thing    {T}
           FK2 {C} REFERENCES category {C}

T |-[0,1]--------[0,1]-| C

9. Each thing is in at most one category;
for each category: at most one thing is in that category.

thing {T}
   PK {T}


category {C}
      PK {C}


thing_category {T, C}
            PK {T}
            AK {C}

           FK1 {T} REFERENCES thing    {T}
           FK2 {C} REFERENCES category {C}

T |-[1,*]--------[0,*]-| C

10. Each thing may be in more than one category;
for each category: at least one thing is in that category.

thing {T}
   PK {T}


category {C, T}
      PK {C}

      FK {T} REFERENCES thing {T}


thing_cat {T, C}
       PK {T, C}

      FK1 {T} REFERENCES thing    {T}
      FK2 {C} REFERENCES category {C}


VIEW thing_category {T, C}
AS
category  {T, C}
UNION
thing_cat {T, C}

T |-[0,*]--------[1,*]-| C

11. Each thing is in at least one category;
for each category: more than one thing may be in that category.

thing {T, C}
   PK {T}

   FK {C} REFERENCES category {C}


category {C}
      PK {C}


thing_cat {T, C}
       PK {T, C}

      FK1 {T} REFERENCES thing    {T}
      FK2 {C} REFERENCES category {C}


VIEW thing_category {T, C}
AS
thing {T, C}
UNION
thing_cat {T, C}

Not Easy

Cases from 12 to 16 require use of triggers, stored procedures, deferred constraints, or application code to enforce the constraint.

12. T |-[1,*]----------[1]-| C
Each thing is in exactly one category;
for each category: at least one thing is in that category.

13. T |-[1]----------[1,*]-| C
Each thing is in at least one category;
for each category: exactly one thing is in that category.

14. T |-[1,*]--------[0,1]-| C
Each thing is in at most one category;
for each category: at least one thing is in that category.

15. T |-[0,1]--------[1,*]-| C
Each thing is in at least one category;
for each category: at most one thing is in that category.

16. T |-[1,*]--------[1,*]-| C
Each thing is in at least one category;
for each category: at least one thing is in that category.

The Problem

The main problem is lack of assertions (DB-wide constraints) in main SQL implementations. SQL standard actually defines them (CREATE ASSERTION), but no luck yet. Hence, not every business constraint can be elegantly defined in SQL. Often some creativity, compromise, and awkwardness is required.


Note:

All attributes (columns) NOT NULL

PK = Primary Key
AK = Alternate Key (Unique)
FK = Foreign Key

Shift Scheduling

Another challenge from Decision Management Community. Here is the task, I have introduced rule numbers (Rxx), and unique initials for each doctor.

  • In a hospital there has to be a doctor present at all times. (R01)
  • Planning is made for the next seven days, starting on Monday. (R02)
  • Each day consists of three shifts: (R03)
    1. Early shift
    2. Late shift
    3. Night shift
  • Every shift has to be assigned to a doctor. (R04)
  • In total, there are five doctors. (R05)
  • Every doctor has a list of available days and some special requirements, see table below.
  • The following rules apply:
    1. A doctor can only work one shift per day. (R06)
    2. A doctor should always be available for his shift (see table). (R07)
    3. If a doctor has a night shift, they either get the next day off, or the night shift again. (R08)
    4. A doctor either works both days of the weekend, or none of the days (R09).
Name Initials Available
Fleming AF (R10) Friday, Saturday, Sunday.
Freud SF (R11) Every day early or late, never night.
Heimlich HH (R12) Every day, but never the night shift on weekends.
Eustachi BE (R13) Every day, every shift.
Golgi CG (R14) Every day, every shift. (R15) Max 2 night shifts.

A plan should be made in which every requirement is fulfilled.

Summary

It was fun to work on this one. The problem boils down to constraint programming over finite domains. It is a bit counter-intuitive, but there are over ten million valid solutions. However, considering that — given 21 shifts and five people — the maximum search space (without any rules) is 5^21 ~ 4.768E14, not very surprising.
I am running this example on an average home machine with 16GB memory. The whole DB size on disk is ~ 2GB, so it should run in-memory even on a machine with 8GB memory.

Answer

The code, for PostgreSQL, contains sections with titles matching this post; it should be easy to follow. I would suggest to open it now and read along.

Domains

Eight domains, I have added shift and day numbers to allow for simple ordering; doctors’ initials to save on screen space in reports.

-- Days
DY    = day_name {'MON', ... 'SUN'}
DYNO  = day_no   {1, ... 7}

-- Daily Shifts
SFT = shift_name {'Early', 'Late', 'Night'}
SNO = shift_no   {1, 2, 3}

-- Weekly shifts
WSFT = week_shift_name {MON-1', 'MON-2', ... 'SUN-3'}
WSNO = week_shift_no   {1, ... 21}

-- Doctors
DOC_NAME = person_name  {'Fleming', ... Golgi}
DOC      = doc_initials {'AF', 'SF', 'HH', 'BE', 'CG'}

Predicates and Relations

-- [p 1]
-- Day of week DY number DYNO exists.
--
day_wk {DY, DYNO}
   KEY {DY}
   KEY {DYNO}

-- data sample
  (DY, DYNO)
------------
  ('SUN', 7)
, ('MON', 1)
, ('TUE', 2)
, ('WED', 3)
, ('THU', 4)
, ('FRI', 5)
, ('SAT', 6)
-- [p 2]
-- Shift SFT with number SNO exists.
--
shift {SFT, SNO}
  KEY {SFT}
  KEY {SNO}


-- data sample
  (SFT, SNO)
--------------
  ('Early', 1)
, ('Late',  2)
, ('Night', 3)
-- [p 3]
-- Week shift WSFT, with week shift number WSNO,
-- of day DY shift SFT exists.
--
week_shift {WSFT, WSNO, DY, SFT}
       KEY {WSFT}
       KEY {WSNO}
       KEY {DY, SFT}

    FK1 {DY}   REFERENCES day_wk
    FK2 {SFT}  REFERENCES shift


-- data sample
  (WSFT, WSNO, DY, SFT)
-------------------------------
  ('MON-1',  1, 'MON', 'Early')
, ('MON-2',  2, 'MON', 'Late' )
, ...
, ('SUN-3', 21, 'SUN', 'Night')
-- [p 4]
-- Doctor DOC_NAME with initials DOC
-- works for the hospital.
--
doctor {DOC_NAME, DOC}
   KEY {DOC}
   KEY {DOC_NAME}


-- data sample
  (DOC_NAME, DOC)
------------------
  ('Fleming',  'AF')
, ('Freud',    'SF')
, ('Heimlich', 'HH')
, ('Eustachi', 'BE')
, ('Golgi',    'CG')

Doctor availability, covers rules (R10) ... (R14).

-- [p 5]
-- Doctor DOC is available for week shift WSFT.
--
doctor_available {DOC, WSFT}
             KEY {DOC, WSFT}

FK1 {DOC}  REFERENCES doctor
FK2 {WSFT} REFERENCES week_shift


-- data sample
  (DOC, WSFT)
---------------
  ('AF', FRI-1)
, ('AF', FRI-2)
, ('AF', FRI-3)
, ...
, ('CG', SUN-3)

The schedule, one possible solution per row. Each column contains initials of a doctor scheduled for that shift.

-- [p 6]
-- According to the schedule (plan) number PLN:
-- doctor "MON-1" is scheduled for Monday  early shift,
-- doctor "MON-2" is scheduled for Monday  late shift,
-- doctor "MON-3" is scheduled for Monday  night shift,
-- doctor "TUE-1" is scheduled for Tuesday early shift,
-- ...
-- doctor "SUN-2" is scheduled for Sunday  late shift.
-- doctor "SUN-3" is scheduled for Sunday  night shift.
--
schedule {  PLN
         , "MON-1", "MON-2", "MON-3"
         , "TUE-1", "TUE-2", "TUE-3"
         , "WED-1", "WED-2", "WED-3"
         , "THU-1", "THU-2", "THU-3"
         , "FRI-1", "FRI-2", "FRI-3"
         , "SAT-1", "SAT-2", "SAT-3"
         , "SUN-1", "SUN-2", "SUN-3"
         }

     KEY {PLN}
     KEY {"MON-1", ... "SUN-3"}

Implemented as a materialized view. Take a look at the code, find "[p 6]". The view code creates a search space by joining doctor_available — already covering (R10) ... (R14) — for each shift of the week. At the same time enforces rules: (R06), (R08), (R09), (R15). The rule (R07) is implied in using doctor_available in joins.

How to Choose a Solution

There are 10,135,296 possible solutions. It is interesting to note that there is no rule that says that each doctor should work, so some solutions do not use every doctor. It is easy to see that all rules could be satisfied by only three doctors working: Eustachi, Heimlich, and Golgi. How to choose a solution? Which criteria to use? In general, a function that takes two solutions a and b and returns true if solution a is better than b is required.

better_than(a,b) -> Boolean

Instead of the function, we can create summary statistics for each solution and then explore the stats to choose. For example:

  • Min number of shifts assigned to a doctor.
  • Max number of shifts assigned to a doctor.
  • Number of doctors required for that schedule.
  • For each doctor, number of shifts assigned to that doctor.
-- [p 7]
-- According to the schedule (plan) number PLN:
-- min number of shifts assigned to a doctor is MNS;
-- max number of shifts assigned to a doctor is MXS;
-- number of doctors required for that plan  is DRQ;
-- doctor Fleming  is assigned AF shifts;
-- doctor Freud    is assigned SF shifts;
-- doctor Heimlich is assigned HH shifts;
-- doctor Eustachi is assigned BE shifts;
-- doctor Golgi    is assigned CG shifts.
--
--
stats {PLN, MNS, MXS, DRQ, AF, SF, HH, BE, CG}
  KEY {PLN}

   FK {PLN} REFERENCES schedule {PLN}

Testing

To allow for quick testing of solutions, a batch of tests is packaged in a view named test_rules.

-- [p 8]
-- There are VIOLATIONS of rule RULE.
--
test_rules {RULE, VIOLATIONS}
       KEY {RULE}

Before continuing, run these two statements manually, they are commented-out at the bottom of the code. This populates both materialized views with data. It may take ~15 minutes to finish, the stats view is slow to generate. Once schedule (plans) and stats are generated, it is easy to test and explore. Good time for a coffee break.

REFRESH MATERIALIZED VIEW schedule;
REFRESH MATERIALIZED VIEW stats;

Once done, run the tests; should finish in about 20 seconds.

SELECT *
FROM test_rules
ORDER BY "VIOLATIONS", "RULE";

+------------+----------------------------------+
| VIOLATIONS | RULE                             |
+------------+----------------------------------+
| 0          | (R04) Every shift has to be ...  |
| 0          | (R06) A doctor can only work one |
| 0          | (R08) Night shift rule.          |
| 0          | (R09) Weekend rule.              |
| 0          | (R10) Fleming only Friday; ...   |
| 0          | (R11) Freud every day early or ..|
| 0          | (R12) Heimlich every day; never .|
| 0          | (R15) Golgi max 2 night shifts.  |
+------------+----------------------------------+

Exploring Solutions

How many rows (solutions) do we have? Check both, schedule and stats views.

WITH
q_1 AS (
  SELECT count(pln) as "SCHEDULE"
  FROM schedule
),
q_2 AS (
  SELECT count(pln) as "STATS"
  FROM stats
)
SELECT *
FROM q_1
JOIN q_2 ON true ;

+----------+----------+
| SCHEDULE | STATS    |
+----------+----------+
| 10135296 | 10135296 |
+----------+----------+

How many plans are available, per number of required doctors for that plan?

SELECT DRQ       AS doctors_required
    , count(pln) AS "plans"
FROM stats
GROUP BY DRQ ;

+------------------+---------+
| doctors_required | plans   |
+------------------+---------+
| 3                |     384 |
| 4                |  627712 |
| 5                | 9507200 |
+------------------+---------+

Show five plans where all five doctors are required, each one works minimum three, maximum five shifts, and Dr. Freud works four shifts. See the result.

SELECT b.*
FROM stats    AS a
JOIN schedule AS b USING (pln)
WHERE drq = 5 -- doctors required
  AND mns = 3 -- min number of shifts per doc
  AND mxs = 5 -- max number of shifts per doc
  AND  sf = 4 -- Freud works 4 shifts
LIMIT 5 ;

How many plans are available where all five doctors are required, and each doctor works minimum four shifts?

SELECT count(pln) as cnt
FROM stats
WHERE drq = 5 -- doctors required
  AND mns = 4 -- min number of shifts per doc
;

+-----+
| cnt |
+-----+
|  0  |
+-----+

Well, this is interesting. It means that doctor hours have to be balanced over more than one week. The fun continues.

Balancing Workload

Can we find a solution such that over a period of N weeks each doctor works the same number of shifts? Actually, no. There are 21 shifts per week, five doctors, so in average, without any special rules, a doctor should work 21/5 = 4.2 shifts per week. However, Dr. Fleming can work only 3 shifts per week. Even if Fleming works all three shifts, others have to work in average (21-3)/4 = 4.5 shifts per week, so he can never catch up with other doctors.
So, the solution is to balance workload of other doctors, based on Fleming’s availability.

If Fleming works AF shifts per week,
other doctors have to work AVG shifts in average,
which takes (minimum) of WKS weeks to balance.

 AF  AVG   WKS
-----------------
 1   5      1
 2   4.75   4
 3   4.5    2

If Fleming works only one shift a week, any plan where other doctors work 5 shifts per week will do. This plan can also be static, no need to change it; all plans here satisfy (R08) rule.

SELECT count(pln) as cnt
FROM stats
WHERE af  = 1
  AND sf  = 5
  AND hh  = 5
  AND be  = 5
  AND cg  = 5 ;

+-------+
|  cnt  |
+-------+
| 53120 |
+-------+

Plenty of plans for this case.

If he works three shifts, then any solution which leads to a total of nine shifts, over two weeks, for other doctors will be fine — as long as it satisfies the night shift rule R(08). I would argue that this two-week plan has to be static too.

The simplest way to balance workload and get the desired average is to alternate between four and five shifts per week for other doctors: (4+5)/2 = 4.5. Let’s consider a pair of plans (A,B) that would satisfy this. How many can be found?

WITH
w1 AS (
  SELECT count(pln) as cnt_A
  FROM stats
  WHERE af  = 3
    AND sf  = 4
    AND hh  = 4
    AND be  = 5
    AND cg  = 5
),
w2 AS (
  SELECT count(pln) as cnt_B
  FROM stats
  WHERE af  = 3
    AND sf  = 5
    AND hh  = 5
    AND be  = 4
    AND cg  = 4
)
SELECT cnt_A, cnt_B
FROM w1
JOIN w2 on true ;

+--------+--------+
| cnt_A  | cnt_B  |
+--------+--------+
| 145152 | 127872 |
+--------+--------+

Not symmetrical, but the total search space is a product of these two numbers, so ~18.56E9 combinations. Any of these that satisfies (R08) would qualify.
That’s too many bytes for my home computer, will not fit into memory. Is there a way to reason some of this search space away? Given that all existing plans satisfy (R08) it may be possible to search for plan pairs that start and end the same. If the first two shifts of Monday, and the last Sunday shift are the same, the pair (A,B) satisfies (R08).

A(MON-1, MON-2, SUN-3) = B(MON-1, MON-2, SUN-3)

Does it work? Yes, there are 1,122,295,808 pairs like this, hence simplifying the join condition, Take a look at the part two of the code and the result example.

And what about the case where Fleming works two shifts, and it takes four weeks to balance the workload? I think I need a bigger computer.

Forgotten Simplicity

Teaching normalization past 1NF in introductory database (DB) courses should be considered harmful. Obsessing about normal forms (past 1NF) should be left for advanced database courses.

Every basic course on relational databases includes one or more chapters on normal forms: 1NF, 2NF, 3NF, BCNF, 4NF, 5NF, 6NF; and sometimes a few more. These are somehow considered essential. I find that this leads to confusion and most people simply do not know what to do with it, other than to learn just enough to pass an exam. Take a look at your own (personal or business) SQL DBs, very likely they are a complete mess.

This post is geared towards developers who want to keep their databases free of logical errors, easily expandable, and easy to maintain. At the same time, to stay away from terms like: functional dependency, join dependency, multivalued dependency, Heath theorem, the chase algorithm, rigorous mathematical definitions of normal forms, and so on.

There is a simpler way, a better way, a beautiful way, once understood by most, forgotten now. It somehow got swept under the sands of time.

Q: The way?

Use natural language and logic. Describe your business problem using predicates and constraints. Match relations to predicates (they go together), everything else simply follows.

Q: What is a relation?

For a developer, it’s a data type. A higher-order data type. It has a precise definition, look it up. The same way there exist integer: data type, value and variable; there exist relation: data type, value and variable.

Q: Relation value, variable? What does it represent?

A set of facts, under a common predicate.

Q: Facts, predicate?

A fact is a proposition considered to be true. A proposition is a meaning of a declarative sentence. A predicate is a generalized proposition. So, a predicate with a matching relation represents a set of facts — sentences considered to be true.

Q: Store sets of facts?

Relational variables (SQL tables in 1NF) store sets of facts. Use relational operators to calculate on these facts.

Q: Relational operators?

Every data type comes with operators, otherwise it would not be very useful. The same way integers come with: addition, subtraction multiplication, …; lists with: head, tail, concatenation …; relations come with: join, union, project, restrict, (their sql equivalents) etc.

Q: What are these operators doing?

Operations on sets of facts: reasoning.

Q: What about joins, are they bad, should we avoid them?

Should we avoid multiplication of integers, concatenation of strings and lists; unions and intersects of sets, etc.?

Q: And keys: primary, alternate, foreign?

Every set of facts needs one or more logical constraints to make sense. These should be spelled out in a natural language, together with predicates. Table keys (PK, AK) are simply implementations of these common-sense logical constraints. Similar for FK, it is an implementation of an inclusion constraint in between two sets of facts.
It is important to understand keys and other constraints.

Q: What is this normalization thing about?

It is all about removing and preventing logical errors.
First, SQL does not work with relations out-of-the-box, but with tables. A table can represent a relational variable, but it takes rigor — the burden is on the programmer. When a table properly represents a relational variable it is said to be in 1NF, so this is important– look it up.
Second, and this is the big one: preventing contradiction. Contradiction in logic leads to so called "deductive explosion". Steps have to be taken to prevent contradiction. Most of the time this involves implementing proper constraints and eliminating (reducing) redundancy by decomposition into projections (vertically splitting tables).

Q: How far to go with decomposition, when is it enough?

Stop when one or more of the following holds true:

  • There is no redundancy.
  • There is redundancy, but it can not be removed by decomposition.
  • It is not possible to decompose table without losing information.
  • It is not possible to decompose table at all.

Just make sure that when reasoning abut redundancy, you reason about the predicate and constraints, not about a few rows of sample data.

Q: How does redundancy lead to contradiction?

It is possible to state something that is true and not true at the same time, when redundant data goes out of sync.

Q: What to do about it?

Remove redundancy from relations (tables in 1NF), mostly by decomposing into projections. Use predicates and constraints, write them down, start with basic building blocks, use logic.

Q: What basic building blocks?

Simple predicates, which lead to sets of elementary facts. Simple predicate is a predicate which can not be split without losing information, same for elementary facts.

Q: How to use this in design?

Start with simple predicates and combine them based on constraints (keys). Most of DB normalization story is about decomposing tables, the real question is how to compose simple predicates.

Q: What about normal forms?

Reasoning about relations yields 1NF, removing redundancy to 5NF, simple predicates to 6NF. In other words, you can get your tables into 5NF even if you do not know the precise definition of it. Btw, useful normal forms can be easily verbalized:

Form Meaning
1NF Table properly represents a relational variable.
BCNF Every fact depends on the key, the whole key and nothing but the key; for every key (there can be more than one).
5NF Decomposing table would not remove any redundancies.
6NF Matches simple predicate.

Take a look at this again, only meaning is important, the form name is irrelevant.

Q: What about 2NF, 3NF, 4NF, and few others; should developers study them?

Not very useful. If you need to pass an exam, then you do as the curriculum demands. You may decide to study them under "advanced db topics", or to learn some db-history. Maybe you are a DB aficionado, I am.
It is important to understand 1NF, the rest is optional as long as you understand the logic of it (see the table above).

Q: So not much rigorous math required?

No, just basic understanding of predicates and constraints in logic, and how do they map to DB concepts of: relations, relational variables, and constraints.

Q: Any other benefits?

  • Using logic and natural language is a huge advantage when one needs to communicate with analysts, product managers, and other business people.
  • It is a good method to elicit business requirements, makes it easy to reason with non-techies; and once done you actually end up with a usable DB model. Bonus documentation.
  • Easy to expand. As business evolves, new predicates (tables) are added, as opposed to constant refactoring.
  • Schema is stable over long period of time, new tables may be added, but old ones are rarely altered. There is no need to change analytics queries (code) that already work, even if application code changes.
  • And finally performance, aka speed. Prevailing mantra "joins are slow" is true only for toy examples, low data volumes, when CPU cycle count is the only resource that matters. Large DBs run faster with narrow, "normalized" tables — a natural consequence of this design approach.

Q: Any examples?

See the Eight Queens problem solved in SQL, using this design method.