PostgreSQL, Send More Money

It is often forgotten that the original idea which led to the development of SQL is the relational model and that it stems from (first-order) logic. The idea is based on concepts of predicates, facts, constraints, sets, set operations, logical operators, relations, relational operators, etc.

Hence, most problems that can be reasoned about in these terms can be easily implemented in SQL. The implementation from the reasoning to the code is often straightforward.

Consider a paradigm of constraint programming over finite domains: the common example for this is the puzzle SEND + MORE = MONEY where each letter represents a different digit.

   SEND
+  MORE
= MONEY

given S <> 0, M <> 0

It is fairly easy to reason about this problem in terms of sets, set operations, constraints, predicates, and logical operations; hence the translation from the reasoning to the SQL code should be straightforward.

Reasoning

The domain for each digit is {0 … 9}; so let’s start with a set named Digit and eight aliases for that same set. I will treat each digit in the result as an element of the corresponding alias set, essentially they are all elements of the Digit set too.

Digit = {0 ... 9}

St = Et = Nt = Dt = Mt = Ot = Rt = Yt = Digit

S ∈ St
E ∈ Et
N ∈ Nt
D ∈ Dt
M ∈ Mt
O ∈ Ot
R ∈ Rt
Y ∈ Yt

The search space is the Cartesian product of these sets — a finite set of tuples — and the result set is a subset of this.

Srch = St X Et X Nt X Dt X Mt X Ot X Rt X Yt

Srch = {  (0, 0, 0, 0, 0, 0, 0, 0)
        ...
        , (9, 9, 9, 9, 9, 9, 9 ,9)
       }

Result ⊆ Srch

Given that the search set is finite, it is possible to apply constraints to each tuple in the set and keep only tuples that satisfy these constraints.

The original set of constraints, as per the puzzle problem, can be written as:

M > 0

S > 0

  D + N * 10 + E * 100 + S * 1000
+ E + R * 10 + O * 100 + M * 1000
= Y + E * 10 + N * 100 + O * 1000 + M * 10000

all_different (S,E,N,D,M,O,R,Y)

The last — all_different — constraint seems obvious, but is needed because the Cartesian product repeats digits within a tuple.

The next step is to check if it is possible to reduce the search space by reasoning about the problem.

Reducing the Search Space

Given that the result has more digits than both operands and that the maximum sum of two single digit numbers is 18, the maximum carry is 1; therefore the first digit of the result must be 1, that is, M = 1.

Given that M = 1 and there exists a carry in (S + M), S must be at least 8, that is, S >= 8.

From the first column, (D + E) may or may not have a carry, but in both cases it holds that (D+E) modulo 10 = Y.

Step Reasoning Note Search Space
1 Cartesian Product 10^8 1E+08 = 100M
2 M=1 also step 1 1E+07 = 10M
3 S >= 8 also steps 1,2 2E+06 = 2M
4 (D+E) mod 10 = Y also steps 1,2,3 2E+05 = 200K

So, by simple reasoning it was possible to reduce the search space by 500 times. Required memory space (PostgreSQL) for this data set may be estimated to (8*4 + 24) * 200K bytes, approximately 11.5 MB — which should fit into a memory of a modest laptop.

To summarize all constraints:

M = 1  -- implies M > 0

S >= 8 -- implies S > 0

(D + E) mod 10 = Y

  D + N * 10 + E * 100 + S * 1000
+ E + R * 10 + O * 100 + M * 1000
= Y + E * 10 + N * 100 + O * 1000 + M * 10000

all_different (S,E,N,D,M,O,R,Y)

SQL Implementation

Translation to the SQL code is now straightforward — you should be able to read the code and the reasoning section in parallel.

  • The all_different constraint is implemented as a separate Boolean function which takes an array of integers for an input.
  • Constraints are essentially Boolean expressions and are placed in the WHERE clause of the query.
  • The Digit domain (set) is represented by a relation with one attribute (single column).
  • The cross product of sets St … Yt translates to CROSS JOINs in the FROM clause of the query.
  • The final result is packaged in a view named Send_MM.
  • SELECT * FROM Send_MM;
    
    s   e   n   d   m   o   r   y
    ------------------------------
    9   5   6   7   1   0   8   2
    
       9567
    +  1085
    = 10652
    

    On my machine this takes 160 milliseconds, should run fine even on very modest hardware.

    Summary

    To summarize, the idea of this exercise is to show that the relational model (and SQL) is much more than just “tabular data storage”, as often perceived.

    If a problem can be reasoned about in a natural language, using concepts like sets, set operations, domains, predicates, constraints, relations, logical operations, etc., then translation to a relational model and the SQL is fairly simple.