Grok Normalization

Which normal form (NF) is my table in? How do I normalize? How to decompose to 3NF? BCNF? How to understand the 5th normal form? Mess, confusion, chaos, conflicting advice, endless arguing on StackOverflow. Misleading advice, even in books.

In Short

Normalization is a process for removing and preventing logical errors in a relational database.

Most of the time this involves implementing proper constraints and reducing — eliminating when possible — redundancy by decomposition into projections.

Fun Example

This simple example is designed to cut through the fog, confusion, and noise generated by large number of conflicting "normalization" articles.

A group of ten J-named people: Jane, Jack, Jill, Joe, James, Janet, Jackie, Jacob, Joan, and Jim, decide to make a SQL DB about themselves. One of tables in the DB is named smoker, stating who smokes.

Case 1

smoker
+-------------+
| NME    (PK) |
+-------------+
| Jack        |
| James, Jill |
| Joan        |
+-------------+
  1. Comment on the current NF.
  2. Is it possible to fix and how?
  3. Comment on the NF after the fix.
  4. How to prevent the error for any future data changes?

Case 2

Same scenario, but this time the table is empty.

smoker {NME}
    PK {NME}
  • Consider the same four questions as in the previous case.

Case 3

Again the table is empty. This time is has a foreign key (FK) to a table named person, which lists all ten people in the group.

smoker {NME}
    PK {NME}
    FK {NME} REFERENCES person {NME}
  • The same four questions.

Notes

Is this all there is to it? No. However, understanding this will make you immune to misleading examples. Once you get this, it is easy to read about this-and-that NF examples, problems, and fixes.

If I had to recommend only one book, it would be a tossup between Hal08 and Dat12. Both require study. However, as a developer you do not need to obsess about NFs. Stick to a simple design method and your tables will be in a high NF as a natural consequence of the design. Use predicates and constraints, everything else will fall in place.

Grokking something is a very personal thing, so it may be best not to provide answers. Well, try not to look.


All attributes (columns) NOT NULL

PK = Primary Key
FK = Foreign Key