Intro to Over
On previous pages we looked at the utilisation of a single aggregation function in a SELECT
statement.
Suppose we’d like to wrap an aggregation function around another aggregation function like so: SELECT SUM(COUNT(*))
.
In SQL, this is unfortunately forbidden.
We could instead rely on subqueries (CTEs in particular) to get around this limitation.
However, there are some instances where we need not result to such verbosity.
We now introduce the OVER()
clause, which makes such expressions possible in a single statement.
Questions
First, let’s assume we need to answer the following questions:
- Where do our users come from?
- How do we express the aggregation as a percentage?
Returning the Count
SQL
First, let’s write a query that returns the actual figures.
SELECT
Country
, COUNT(*) AS Cnt
FROM music.Users
GROUP BY
Country
ORDER BY
COUNT(*) DESC;
Output
\[\begin{array}{|c|c|} \hline \text{Country} & \text{Cnt} \\ \hline \text{United States} & 26 \\ \text{France} & 9 \\ \text{Russia} & 7 \\ \text{Poland} & 4 \\ \cdots & \cdots \\ \hline \end{array}\]Computing the Total
To compute the percentage, we first need to get the total number of users in music.Users
.
We can augment that above query such that the total is the sum of every value in Cnt
.
That is: SUM(COUNT(*))
.
If we include this expression in our query, we will get an error:
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
That is, we cannot use an aggregate function on another aggregate function. Why? It kind of boils down to a matter of sequencing.
To rephrase the problem, we want to sum all the values of Cnt
in the output above.
That is, we need to invoke SUM()
after COUNT(*)
has given us its result set.
Therefore, we need to explicitly tell SQL to use SUM()
only once the above result set has been computed.
We can do this with OVER()
.
This allows us to define the intial result set from the first aggregation as a window.
So instead of aggregating on an aggregate, we aggregate on the result set via a window which interposes itself between the two aggregation steps.
Now we write: SUM(COUNT(*)) OVER()
In this case the window is the entire result set returned by the above query.
This is because have not specified any partitioning.
We are summming all values of Cnt
.
SQL
SELECT
Country
, COUNT(*) AS Cnt
, SUM(COUNT(*)) OVER() AS Total
FROM music.Users
GROUP BY
Country
ORDER BY
COUNT(*) DESC;
Output
\[\begin{array}{|c|c|c|} \hline \text{Country} & \text{Cnt} & \text{Total} \\ \hline \text{United States} & 26 & 100 \\ \text{France} & 9 & 100 \\ \text{Russia} & 7 & 100 \\ \text{Poland} & 4 & 100 \\ \cdots & \cdots & \cdots \\ \hline \end{array}\]Note: Cnt
in the output table does not sum to 100 as I’m not displaying every row in the result set.
Expressing the Percentage
Now we can simply write an expression that divides Cnt
by Total
and multiplies by 100.0 to get the percentage.
SQL
SELECT
Country
, COUNT(*) AS Cnt
, SUM(COUNT(*)) OVER() AS Total
, COUNT(*)*100.0/SUM(COUNT(*)) OVER() AS Pcnt
FROM music.Users;
GROUP BY
Country
ORDER BY
COUNT(*) DESC;
Note: when multiplying or dividing an integer by another integer, our result will also be an integer.
If we expect a non-integer result, we can cast an int
to float
by rewriting the integer as a decimal.
This is known as implicit conversion.
To illustrate this, compare these two expressions: SELECT 1/2
and SELECT 1/2.0
.
Output
\[\begin{array}{|c|c|c|c|} \hline \text{Country} & \text{Cnt} & \text{Total} & \text{Pcnt }\\ \hline \text{United States} & 26 & 100 & 26.000000000000\\ \text{France} & 9 & 100 & 9.000000000000 \\ \text{Russia} & 7 & 100 & 7.000000000000 \\ \text{Poland} & 4 & 100 & 4.000000000000 \\ \cdots & \cdots & \cdots & \cdots \\ \hline \end{array}\]Voila! We have computed the percentage distribution of an aggregation in SQL using a single query. We could have used a subquery to return the total, but that would have been verbose.
Verbose Example
WITH cteTotal AS (
SELECT
COUNT(*) AS Total
FROM music.Users;
)
SELECT
Country
, COUNT(*)*100.0/(SELECT Total FROM cteTotal)
FROM music.Users
GROUP BY
Country
ORDER BY COUNT(*) DESC;