It is often forgotten that the original idea which led to the development of SQL is the relational model and that it stems from (first-order) logic. The idea is based on concepts of predicates, facts, constraints, sets, set operations, logical operators, relations, relational operators, etc.
Hence, most problems that can be reasoned about in these terms can be easily implemented in SQL. The implementation from the reasoning to the code is often straightforward.
Consider a paradigm of constraint programming over finite domains: the common example for this is the puzzle SEND + MORE = MONEY where each letter represents a different digit.
SEND + MORE = MONEY given S <> 0, M <> 0
It is fairly easy to reason about this problem in terms of sets, set operations, constraints, predicates, and logical operations; hence the translation from the reasoning to the SQL code should be straightforward.
Reasoning
The domain for each digit is {0 … 9}; so let’s start with a set named Digit and eight aliases for that same set. I will treat each digit in the result as an element of the corresponding alias set, essentially they are all elements of the Digit set too.
Digit = {0 ... 9} St = Et = Nt = Dt = Mt = Ot = Rt = Yt = Digit S ∈ St E ∈ Et N ∈ Nt D ∈ Dt M ∈ Mt O ∈ Ot R ∈ Rt Y ∈ Yt
The search space is the Cartesian product of these sets — a finite set of tuples — and the result set is a subset of this.
Srch = St X Et X Nt X Dt X Mt X Ot X Rt X Yt Srch = { (0, 0, 0, 0, 0, 0, 0, 0) ... , (9, 9, 9, 9, 9, 9, 9 ,9) } Result ⊆ Srch
Given that the search set is finite, it is possible to apply constraints to each tuple in the set and keep only tuples that satisfy these constraints.
The original set of constraints, as per the puzzle problem, can be written as:
M > 0 S > 0 D + N * 10 + E * 100 + S * 1000 + E + R * 10 + O * 100 + M * 1000 = Y + E * 10 + N * 100 + O * 1000 + M * 10000 all_different (S,E,N,D,M,O,R,Y)
The last — all_different — constraint seems obvious, but is needed because the Cartesian product repeats digits within a tuple.
The next step is to check if it is possible to reduce the search space by reasoning about the problem.
Reducing the Search Space
Given that the result has more digits than both operands and that the maximum sum of two single digit numbers is 18, the maximum carry is 1; therefore the first digit of the result must be 1, that is, M = 1.
Given that M = 1 and there exists a carry in (S + M), S must be at least 8, that is, S >= 8.
From the first column, (D + E) may or may not have a carry, but in both cases it holds that (D+E) modulo 10 = Y.
Step | Reasoning | Note | Search Space |
---|---|---|---|
1 | Cartesian Product | 10^8 | 1E+08 = 100M |
2 | M=1 | also step 1 | 1E+07 = 10M |
3 | S >= 8 | also steps 1,2 | 2E+06 = 2M |
4 | (D+E) mod 10 = Y | also steps 1,2,3 | 2E+05 = 200K |
So, by simple reasoning it was possible to reduce the search space by 500 times. Required memory space (PostgreSQL) for this data set may be estimated to (8*4 + 24) * 200K bytes, approximately 11.5 MB — which should fit into a memory of a modest laptop.
To summarize all constraints:
M = 1 -- implies M > 0 S >= 8 -- implies S > 0 (D + E) mod 10 = Y D + N * 10 + E * 100 + S * 1000 + E + R * 10 + O * 100 + M * 1000 = Y + E * 10 + N * 100 + O * 1000 + M * 10000 all_different (S,E,N,D,M,O,R,Y)
SQL Implementation
Translation to the SQL code is now straightforward — you should be able to read the code and the reasoning section in parallel.
SELECT * FROM Send_MM; s e n d m o r y ------------------------------ 9 5 6 7 1 0 8 2 9567 + 1085 = 10652
On my machine this takes 160 milliseconds, should run fine even on very modest hardware.
In a simpler version of the code — for PostgreSQL and SQL Server — the all_different constraint is implemented in join conditions of the query.
Summary
To summarize, the idea of this exercise is to show that the relational model (and SQL) is much more than just “tabular data storage”, as often perceived.
If a problem can be reasoned about in a natural language, using concepts like sets, set operations, domains, predicates, constraints, relations, logical operations, etc., then translation to a relational model and the SQL is fairly simple.