Intro to Over
On previous pages we looked at the utilisation of a single aggregation function in a
Suppose we’d like to wrap an aggregation function around another aggregation function like so:
In SQL, this is unfortunately forbidden.
We could instead rely on subqueries (CTEs in particular) to get around this limitation.
However, there are some instances where we need not result to such verbosity.
We now introduce the
OVER() clause, which makes such expressions possible in a single statement.
First, let’s assume we need to answer the following questions:
- Where do our users come from?
- How do we express the aggregation as a percentage?
Returning the Count
First, let’s write a query that returns the actual figures.
SELECT Country , COUNT(*) AS Cnt FROM music.Users GROUP BY Country ORDER BY COUNT(*) DESC;
Computing the Total
To compute the percentage, we first need to get the total number of users in
We can augment that above query such that the total is the sum of every value in
If we include this expression in our query, we will get an error:
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
That is, we cannot use an aggregate function on another aggregate function. Why? It kind of boils down to a matter of sequencing.
To rephrase the problem, we want to sum all the values of
Cnt in the output above.
That is, we need to invoke
COUNT(*) has given us its result set.
Therefore, we need to explicitly tell SQL to use
SUM() only once the above result set has been computed.
We can do this with
This allows us to define the intial result set from the first aggregation as a window.
So instead of aggregating on an aggregate, we aggregate on the result set via a window which interposes itself between the two aggregation steps.
Now we write:
In this case the window is the entire result set returned by the above query.
This is because have not specified any partitioning.
We are summming all values of
SELECT Country , COUNT(*) AS Cnt , SUM(COUNT(*)) OVER() AS Total FROM music.Users GROUP BY Country ORDER BY COUNT(*) DESC;
Cnt in the output table does not sum to 100 as I’m not displaying every row in the result set.
Expressing the Percentage
Now we can simply write an expression that divides
Total and multiplies by 100.0 to get the percentage.
SELECT Country , COUNT(*) AS Cnt , SUM(COUNT(*)) OVER() AS Total , COUNT(*)*100.0/SUM(COUNT(*)) OVER() AS Pcnt FROM music.Users; GROUP BY Country ORDER BY COUNT(*) DESC;
Note: when multiplying or dividing an integer by another integer, our result will also be an integer.
If we expect a non-integer result, we can cast an
float by rewriting the integer as a decimal.
This is known as implicit conversion.
To illustrate this, compare these two expressions:
SELECT 1/2 and
Voila! We have computed the percentage distribution of an aggregation in SQL using a single query. We could have used a subquery to return the total, but that would have been verbose.
WITH cteTotal AS ( SELECT COUNT(*) AS Total FROM music.Users; ) SELECT Country , COUNT(*)*100.0/(SELECT Total FROM cteTotal) FROM music.Users GROUP BY Country ORDER BY COUNT(*) DESC;