How to select/display all rows with duplicate values in MySQL

(This should be a fub, but the description is too long.)

One of the most frequent uses of database tables is to identify duplicate values in your data. Assume you have a table with a list of names, and you want to know how many of the names appear in the table more than one time. Most books on MySQL will tell you how to do this. Here is a recipe from Paul Dubois (MySQL Cookbook 3rd ed., O’Reilly p. 556):
SELECT COUNT(*), last_name, first_name
-> FROM catalog_list
-> GROUP BY last_name, first_name
-> HAVING COUNT(*) > 1

Suppose, however, that you want more information about the duplicates, for example their birthdates, which is contained in another field in each row. Just adding the field birth_date to this sql command will only show you the birthdate for one of each of the duplicates. Suppose you want to show data for all of the duplicates? Dubois actually suggests how to do this (p. 560), but his solution involves making a temporary table and joining it against the main table. Is it possible to accomplish the same task without an intermediate table? The answer is yes: use a self-join. I have great difficult understanding and using self-joins, so it took me quite a while to get it. Even worse, I’m pretty sure I’ve spent considerable time figuring this out more than once. So here is the solution, for permanent reference:

SELECT t1.last_name,t1.first_name,t1.birth_date
  FROM catalog_list as t1 
    INNER JOIN (
    SELECT last_name,first_name,count(*) 
      FROM catalog_list 
      GROUP BY last_name,first_name 
      HAVING COUNT(*)>1
    ) as t2
  ON t1.last_name = t2.last_name and t1.first_name = t2.first_name
ORDER BY t1.last_name, t1.first_name, t1.birth_date
This entry was posted in Programming, Software. Bookmark the permalink.