DB2 SQL DELETE DUPLICATE DATA FROM THE TABLE.
There are multiple ways to Delete duplicate data from A Table using SQL Query. Here we are going to talk about using Correlated subqueries to Delete duplicate data.
Let say we have an EMPLOYEE table as follows.
In the above EMPLOYEE table, multiple records are duplicates. To remove all duplicate records, we can write a very simple correlated subquery.
Delete from EMPLOYEE E1
where RRN(E1) > (Select MIN(RRN(E2)) from EMPLOYEE E2 where E2.id = E1.id)
To delete duplicate records, we need to find a unique value for each record of the table, so we are using SQL RRN() function to get the relative record numbers for each record.
To complete the execution of subquery “Select MIN(RRN(E2)) from EMPLOYEE E2 where E2.id = E1.id” SQL needs the value of “E1.id” from the outer query. So SQL will read the first record (RRN = 1) for the outer query and get E1.id = 1 and pass this value to the subquery. Now the subquery will be executed as
Select MIN(RRN(E2)) from EMPLOYEE E2 where E2.id = 1
If RRN found in the outer query is bigger than RRN found in the subquery, then SQL will delete that record from the EMPLOYEE table. If RRN found in the outer query is smaller or equal to RRN found in the subquery, then the system will skip to the next record of the outer query. This process will go on for every record of the outer query. At last, only one record for each id with minimum RRN will be remaining in the table.
So Final output will be