Sometimes we don’t want to have reptitions of certain values across multiple tuples in our dataset. For instance, we might not want to see an email address repeated more than once and simply using DISTINCT is not an option.

In this case, I will invariably use a CTE to remove the offending repetitions.

Row Number

This solution will use int ROW_NUMBER(), which returns an integer than increments (from 1, by 1) as it traverses the dataset. It is also a windowed function such that we can partition our dataset along identical values if we so wish.

Subquery

Suppose we include ROW_NUMBER() like so:

SELECT
	Email
,	ROW_NUMBER() OVER (PARTITION BY Email ORDER BY Email) AS rn
FROM dbo.Account;

Subquery Output

\[\begin{array}{|c|c|} \hline \text{Email} & \text{rn} \\ \hline \text{fakeUser@FakeDomain.com} & 1 \\ \text{anotherFake@alsofake.com} & 1 \\ \text{anotherFake@alsofake.com} & 2 \\ \text{notanemail@notReal.com} & 1 \\ \hline \end{array}\]

CTE

This means we can delete any row where RN > 1. However, we cannot use ROW_NUMBER() in WHERE or HAVING clauses. Therefore we use a CTE.

WITH cteDedupe AS (
	SELECT
		ROW_NUMER() OVER (PARTITION BY Email ORDER BY Email) AS rn
	FROM dbo.Account
	)
DELETE cteDedupe 
WHERE rn > 1;

Updated: