Contradiction in Database [2]

The previous post focused on a contradiction in general; in this one I present a specific example using PostgreSQL.

Example

Starting with a predicate:
[p_1] Country named (CNAME) is assigned country code (CC2) and number (CID).

Country {CID, CC2, CNAME}
    KEY {CID}

Country (CID, CC2,  CNAME   )
      { (1,  'CA', 'Canada' )
      , (2,  'FR', 'France' )
      , (3,  'DE', 'Germany')
      , (4,  'DE', 'Denmark') -- typo
      }

Take a look at the sketch and the DLL. This is a common find: the ID is used as a primary key and there are no other constraints.

Consider the tuple with CID = 4; it has a wrong country code for Denmark, which results in a contradiction. From tuples with CID = 3 and CID = 4 it follows:

  • Country code DE is assigned to country named Germany.
  • Country code DE is assigned to country named Denmark.

In this case the contradiction is a result of the typo and a (missing) constraint. The constraint is essentially a part of an ISO standard, a generally accepted fact.

[c1.1] Each country code is assigned to exactly one country.

The idea is to see how can this be represented as a query. Again, let’s name the database Γ, and show that it contains a contradiction.

A = Code DE is assigned to Germany.
B = Code DE is assigned to Denmark.


        B ⊢ ¬A   Γ ⊢ B
        ----------------- (e1)
Γ ⊢ A   Γ ⊢ ¬A
------------------------- (i1)
Γ ⊢ (A ∧ ¬A)

The premise B ⊢ ¬A is consequence of the constraint c1.1.
As per the rule i1, it is possible to conclude that both A and (¬A) are true.

Lilliput

As explained in the previous post: “From contradiction anything follows”. Let’s see if I can prove — from my database — something from Gulliver’s travels, e.g. that country Lilliput exists.

A ∈ Γ, N ∉ Γ
A = Code DE is assigned to Germany.
N = Code DE is assigned to Lilliput.


              Γ ⊢ A            
------ (e1)   ---------- (i2)
Γ ⊢ ¬A        Γ ⊢ (A ∨ N)       
------------------------ (e4)
Γ ⊢ N

The following query is self-explanatory; query labels match rule names. The output of the query lists all countries in the database.

-- [c1.1] Each country code 
-- is assigned to exactly one country.
--
with
i2 as (
    select CC2
    from Country
    where (CC2 = 'DE' and CNAME = 'Germany')
       or (CC2 = 'DE' and CNAME = 'Lilliput')
),
e1 as (
    select CC2
    from Country
    where (CC2 = 'DE' and CNAME != 'Germany')
),
e4 as (
    select CC2, 'Lilliput'::text as CNAME
    from i2
    join e1 using (CC2)
)
select CNAME
from Country
union
select CNAME
from e4 
order by CNAME
;

The result may be interpreted using the predicate:
[p_2] Country named (CNAME) exists.

CNAME
--------
Canada
Denmark
France
Germany
Lilliput

You can actually hear a VP of marketing: “Lilliput, almost forgot about it. Do they use Euros? We have to do a campaign over there ASAP.”

Foolish

“But this is just a result of a foolish query. Why would anyone write a query like that?”, you say. Let’s agree on that — for the time being — and see what happens if I fix the data.

-- fix the typo
update Country set CC2 = 'DK'
where CNAME = 'Denmark' ;

-- add constraints
alter table Country
  add constraint ak1_cn UNIQUE (CC2)
, add constraint ak2_cn UNIQUE (CNAME) ;

Let’s run that foolish query again:

CNAME
--------
Canada
Denmark
France
Germany

Not so Foolish

Interestingly, fixing the data returned the correct result. Time to revisit the concept of a sound deductive (logical) system.

A deductive system is sound if any sentence that is provable in that system is true.

In other words, if my database and a reasoning mechanism form a sound logical system, then I should not be able to conclude falsehood.

Not so Simple

It may be tempting to think that contradiction may be avoided by carefully crafting queries, but it is not that simple.

SQL is a declarative, fourth-generation programming language. The idea is to specify what to do, not how to do it.

A query is passed to the query processor. The processor takes a declarative query, validates the query, optimizes it, converts it to an executable data-flow plan, and executes the plan.

Parse --> Rewrite --> Optimize --> Execute

The query parser validates the query, resolves names, converts it to a format used by the rewrite and the optimizer, and checks user authorization.

The query rewrite simplifies queries without changing the semantics (meaning). It expands views, simplifies arithmetic expressions, logically rewrites predicates in the where clause, semantically optimizes joins, and flattens sub-queries.

The query optimizer produces an executable data-flow plan. It has to deal with selectivity estimates, data distribution histograms, search algorithms, parallelism, and auto-tuning.

The query executor takes a query plan data-flow and executes it.

Bottom line: it is a long way between the declarative query and the way it is actually executed; and that is a good thing.

Not Only SQL

Although this example was about SQL, note that the reasoning applies to any higher-level-of-abstraction language.

Imagine a long and complicated mathematical expression. First you check out if it makes sense (parse); then try to simplify it using some math rules (rewrite); figure out how to calculate it (optimize), e.g.: first calculate numerator, then denominator, then divide; and finally plug in the numbers (execute).

Consider Apache Spark: it started with RDDs only, then added data-frames and SQL; take a look at the optimizer and note the logical optimization section.

Reasoning Machine

There is a “reasoning machine” between your data and a query result based on that data. In general, that machine may be extended to include you and all the code you write; also, other developers, analysts and all the code they write.

If a database contains a contradiction and the reasoning machinery gets hold of it, what comes out is anybody’s guess.

A metaphor, but a good one.