Types and Typos in SQL [1]

Almost anyone who has spent time working with SQL made — or had to fix — this kind of bug:

-- two tables (sketch)
--
users {id, user_name}

trans {id, amt, cur, tx_time, user_id}
-- query with a typo
--
SELECT user_name
     , tx_time
     , amt
     , cur
 FROM users as u
 JOIN trans as t ON u.id = t.id -- <- typo !!
WHERE user_name = 'jack' ;

Try the query, here is the DDL with some data; note the wrong result.

The naming convention uses a generic id propagated from the application's OO model and the default data type is usually integer. The typo joins transaction table on id, instead of user_id; very easy mistake to make and not so easy to catch.

When it does happen that the bug propagates to a production system, the usual response is to first blame a developer, then argue that the pair programming (or peer review) process is not working, then blame QA for not catching it, and eventually discuss naming conventions.

Although it is true that naming attributes user_id, tran_id instead of the generic id would help -- because the predicate would read u.user_id = t.user_id -- the problem is deeper. As it is often with SQL, the problem is not in users nor developers, but in the SQL itself; namely u.id = t.id is both: a type error and a logical error.

In order to demonstrate the concept I have to use something with a "proper" type system.

Haskell to the Rescue

Haskell's type system is based on Hindley-Milner type system, and if you have never heard of it, don't worry. All you have to know about it right now are just two things: 1) it is a beautiful thing, and 2) you want one.

Let's start with the definition of two new types, one for users and one for transactions. I am typing into Haskell's interactive environment, so "λ" is just a terminal prompt.

New Type

λ  newtype User = MkUser Int deriving (Eq, Show)

λ  newtype Tran = MkTran Int deriving (Eq, Show)

The type system can infer types, hence I can ask for types of constructors: MkUser and MkTran.

λ  :type MkUser 
MkUser :: Int -> User

λ  :type MkTran
MkTran :: Int -> Tran

MkUser is a function which takes an Int and returns a User; MkTran is a function which takes an Int and returns a Tran (transaction).

The deriving (Eq, Show) -- in new type definitions -- simply means to derive rules for equality and string-display from the underlying integer type. And right here -- at the very moment of a new type definition -- a question arises:

What does it mean for two instances of a type to be equal?

The answer seems simple, but in general it requires more thought.

Equality

Create two variables: u for a user, and t for a transaction; note that for both of them the value of the underlying integer type is 1.

λ  let u = MkUser 1

λ  let t = MkTran 1

Let's test for equality, first between terms of the same type.

λ  1 == 1
True

λ  u == u
True

λ  t == t
True

λ  MkUser 1 == MkUser 1
True

λ  MkUser 1 == MkUser 2
False

λ  MkTran 4 == MkTran 4
True

λ  MkTran 5 == MkTran 4
False

However, when different types -- user and transaction -- are compared, an error is raised.

λ  u == t

Couldn't match expected type `User' with actual type `Tran'
...

This is a good error to get. A user and a transaction are two different things hence can not be compared, regardless of the fact that both have the underlying integer value of 1.

Contrast this thinking to the previous SQL example, which was happy to join a user on a transaction just because 1=1. An error would be much better.

Functions & Operators

What about the type of the equals operator (==) ?

λ :type (==)
(==) :: Eq a => a -> a -> Bool

Equals is a function which takes two arguments of a same type a and returns Boolean, given that definition of equality for that type exists. In Haskell speak, the type a must be a member of the Eq class, but this is not about Haskell: it is the principle that matters.

How about addition? What will happen if I try to sum two users, or a user and a transaction, since both have an integer as the underlying data type?

Let's see the type of the addition operator.

λ :type (+)
(+) :: Num a => a -> a -> a

The addition is a function which takes two arguments of a numeric type a and returns a result of the type a.

By now it should be easy to understand what happens here:

λ u + u

No instance for (Num User) arising from a use of `+'
...

λ t + t

No instance for (Num Tran) arising from a use of `+'
...

λ t + u

Couldn't match expected type `Tran' with actual type `User'
...

Note that the error in the third example is different from errors in the first two examples. The last one complains about different types, while the first two essentially state that types User and Tran are not numeric.

Summary

If all this type-reasoning looks too complicated, consider the query:

SELECT ((u.id + t.id)^3) - 22 as x
 FROM  users as u
 JOIN  trans as t ON t.id = (u.id + 2)
WHERE  u.id = 1 ;

 x
----
 42

Take user number one, add transaction number three, raise the result to the power of three, subtract 22; and the result is: the meaning of life.

Hmm, so there may be a logical mistake somewhere in that query. Wouldn't it be nice to have a type system that prevents mistakes like this? Although SQL dialects do not offer much in terms of type safety, PostgreSQL's enum data type does.

Next time: PostgreSQL's enum and type safety.