MySQL Type Horrors

Although it is possible to demonstrate the problem in another DB system, I have deliberately chosen MySQL for this article because it literally lends itself to this kind of errors.

Example

A small, but powerful example. Just by looking at this you may guess that this post may have something to do with case sensitivity, but as you will see the problems go much deeper. Here is the MySQL code for the example.

  T { rw integer
    , A  varchar(32)
    , B  blob
    }
KEY {rw}


 T (rw,   A  ,   B  )
 { ( 1, 'ABC', 'ABC')
 , ( 2, 'abc', 'abc')
 }  

For the example I have created a fresh instance of MySQL (v 5.6.27) on Amazon RDS, using default settings. Note that the collation is case and accent insensitive.

select @@collation_database as cdb;

cdb
------------------
latin1_swedish_ci

Trouble

Let’s see what equals to what in table T.

select rw from T where A = B;

rw
--
1
2

Therefore I conclude that:

A1 = B1 
A2 = B2

For simplicity, I am using “Excel notation” to refer to values at row-column intersections in the table.

To check for equality in column A, I simply count number of distinct values in the column.

select count(distinct A) as cnt_a from T;

cnt_a
-----
  1

There is only one distinct value in column A, therefore:

A1 = A2

Based on these two conclusions:

B1 = A1 = A2 = B2

How about column B?

select count(distinct B) as cnt_b from T;

cnt_b
-----
  2

Given that there are two distinct values in column B it is obvious that:

B1 <> B2

If I put this together, the following holds:

B1 = A1 = A2 = B2
B1 <> B2

Interestingly, both statements are true in table T. And we have the contradiction again. I can formalize this a bit. Given a proposition P stating that (B1 = B2), the following holds:

X,Y ∈ T
X = (B1 = A1 = A2 = B2)
Y = (B1 <> B2)

P = (B1 = B2)

⊢ X → P  T ⊢ X       ⊢ Y → ¬P  T ⊢ Y
-------------- (e1)  ---------------- (e2)
T ⊢ P                T ⊢ ¬P
------------------------------------- (i1)
T ⊢ (P ∧ ¬P)

A contradiction leads to nonsense — as previously demonstrated — so, a query that includes table T is likely to return a wrong result.

It Gets Worse

Take a good look at these three queries:

Query Result
rw
select rw from T
where A = B;
1
2
select rw from T
where upper(A) = upper(B);
1
select rw from T
where lower(A) = lower(B);
2

This is a head-scratcher. What happens is weird: upper() and lower() do not work on blob, binary, and varbinary types. Instead of returning an error, they silently return the input value of the function. Duh. To be fair, it is in the manual; but really?

Equality

Time to revisit the definition and properties of equality.

Seems that it was Leibniz who rigorously defined equality; these four properties actually became theorems:

Property For any Must be True
Reflexive a a = a
Symmetry a,b if a = b then b = a
Transitive a,b,c if a = b and b = c then a = c
Substitution a,b,f(x) if a = b then f(a) = f(b)

Given the example, how are we doing some 300 years later?

Property Note
Reflexive ok
Symmetry ok
Transitive broken, first example
Substitution broken, second example

Summary

Using the small example — one table with three columns and two rows — I have managed to break two out of four properties of equality. A contradiction ensued. Not good, but this is the state of the industry.

Avoid using blobs, binary, and varbinary types for strings. Even if developers and DBAs are certain that they can handle nuances of an implicit type conversion, it is not likely that analysts will be able to do so. As time goes by, databases grow, and people come and go: errors from this kind of problems are inevitable.