Joins: Cartesian

Intro

When joining two relations via use of, say, CROSS, INNER, or LEFT joins, it is very important to understand the ramifications of such an undertaking. These kind of join are powerful and easy to implement due to their friendly syntactic sugar and are used all the time by SQL professionals to great effect.

However, it is not uncommon to see these used improperly. This page will advise on and against the various use cases of cartesian joins.

Relations

For each of our examples on this page, we will consider the following two relations: \(R\) and \(S\):

\[R = \begin{array}{|c|} \hline \alpha \\ \hline A\\ B\\ C\\ \hline \end{array} \qquad S = \begin{array}{|c|} \hline \beta \\ \hline C\\ C\\ D\\ \hline \end{array}\]

CROSS JOIN

In SQL, we use CROSS JOIN to compute the cartesian product of two relations. That is, we compute all possible combinations of tuples.

Relational Algebra

We can mathematically represent the cartesian product of two relations as follows:

\[R \times S\]

Where \(\times\) represents the cartesian product.

SQL

To use CROSS JOIN in SQL, we would write something like the following:

SELECT
	*
FROM R
CROSS JOIN S

Note that unlike INNER JOIN, we aren’t matching on anything.

Output

\[R\times S = \begin{array}{|c|c|} \hline \alpha & \beta \\ \hline A & C \\ B & C \\ C & C \\ A & C \\ B & C \\ C & C \\ A & D \\ B & D \\ C & D \\ \hline \end{array}\]

As you can see, our SQL query has returned \(3 \times 3 = 9\) tuples. The result-set of a cartesian product will always be of length \(M \times N\), where \(M\) and \(N\) represent the respective lengths of the two relations.

INNER JOIN

Relational Algebra

In relational algebra, we represent an INNER JOIN as follows:

\[\sigma_{\alpha = \beta} (R\times S)\]

In other words, we again compute a cartesian product but return only the tuples that match on the joining fields.

SQL

The syntax for INNER JOIN in SQL resembles something like this:

SELECT 
	alpha
,	beta
FROM R
INNER JOIN S
ON R.alpha = S.beta

Output

\[\sigma_{\alpha = \beta} (R\times S) = \begin{array}{|c|c|} \hline \alpha & \beta \\ \hline C & C \\ C & C \\ \hline \end{array}\]

In this case we have returned just the one tuple. But INNER JOIN can return any number of tuples in the range \(0 <= M\times N\). It just depends on the number of matching tuples.

LEFT JOIN

Relational Algebra

\[\sigma_{\alpha = \beta}(R\times S) \cup ((R - \Pi_{r_1,...r_n}(\sigma_{\alpha = \beta} (R \times S))) \times S_{NULL})\]

Let’s break this down.

\[\begin{array}{|c|c|} \hline \sigma_{\alpha = \beta}(R \times S) & \text{inner join} \\ ((R_1 - \Pi_{r_1,...,r_n}(\sigma_{\alpha = \beta} (R \times S))) & \text{left anti semi join} \\ S_{NULL} & n\text{-tuple. NULL for each attribute in } S \\ \hline \end{array}\]

So it is more or less accurate to state that a Left Join is the union between an inner join and a left anti semi join.

SQL

As such, we could write our SQL like this to return the product of a left join:

WITH cteInner AS (
	SELECT
		*
	FROM R 
	INNER JOIN S
	ON R.alpha = S.beta
	)
, cteAntiJoin AS (
	SELECT
		*
	FROM R.alpha
	WHERE NOT EXISTS (
		SELECT 
			1 
		FROM cteInner 
		WHERE R.alpha = alpha)
	)
SELECT
	*
FROM cteInner
UNION ALL
SELECT
	*
FROM cteAntiJoin
CROSS JOIN (SELECT NULL AS omega);

But with some syntactic sugar, we need only write:

SELECT 
 *
FROM R
LEFT JOIN S
ON R.alpha = S.beta

Output

\[\begin{array}{|c|c|} \hline \alpha & \beta \\ \hline A & NULL \\ B & NULL \\ C & C \\ C & C \\ \hline \end{array}\]

Updated: