-- File: grok_normalization_a.txt -- Version: 1.0 -- Last Changed: 2020-11-29 -- by: Damir; https://www.damirsystems.com -- Project: Grok Normalization Case 1 ------- The table is not in 1NF. Each row-column intersection must contain exactly one element from the domain, in this case a name of one person. The second row contains two names. This is the original meaning of the "repeating group". The fix is: smoker +-----------+ | NME (PK) | +-----------+ | Jack | | James | | Joan | | Jill | +-----------+ The table is now in 6NF. To prevent this type of error the domain should be constrained. One possibility is to use a table to define all people in the group and make a foreign key (FK) to it. Why is this important? Note that the table went from not-in-1NF to 6NF by a single-step fix. Numeric NF names (1,2,3..) are rather unfortunate. Combined with the statement "normalization is a process", this creates illusion that somehow there exists a natural progress of normalization steps 1,2,3, BCNF... In addition, it is often stated that for a table to be in NF X, it has to be first in NF (X-1). Sometimes this statement holds, but as a general rule it fuels the confusion. The case is that, due to the NF hierarchy, the table that is in NF X is also in (X-1) (for x > 1). In this example, the table is in 6NF, and therefore in 5, 4,..,1. Case 2 ------- The table is in 6NF. Nothing can be improved as far as table NF is concerned. However, it is still a good idea to constrain the domain as reasoned in the first case. Why is this important? A table should be in a certain NF for any number of rows, empty table included. Actually, empty table is the normal state during design process. NFs apply to a relation-variables (tables), not to relation-values. Value is a value, it does not have (insert, update, delete) anomalies; if you change it's a different value. Variables (tables) may have anomalies. To determine a table NF, one needs functional and join dependencies. These are defined between sets of attributes (column names), so no data is needed for this. It is common to add a few rows of data to a table when reasoning about NFs, but this is just a sanity check. Avoid reasoning about NFs based on a few table rows. Use logic and business rules to figure out functional and join dependencies. Case 3 ------- The table is in 6NF. Nothing to fix here, However note that the problem of constraining the domain has shifted to the "person" table. Why is this important? It is hard to constrain domains in SQL. The root problem is in an inadequate type system. What is really needed here would be a data type for a person's name. What we get in SQL is a varchar (string). In this example, each row-column intersection must contain exactly one person-name, not one stringy-thingy. Using numeric IDs does not really help much, because we would still need to constrain to a valid person_id, as opposed to some other kind of a number, like country_id. Be aware of the data type problem in SQL. When reasoning about column types, think domain. For example, think: person-name, county-name, user-id; as opposed to a string or an integer.