How to remove duplicate records

Reply

Join Date: Jun 2005
Posts: 92
Reputation: michael123 is an unknown quantity at this point 
Solved Threads: 0
michael123 michael123 is offline Offline
Junior Poster in Training

How to remove duplicate records

 
0
  #1
Jun 26th, 2008
I have a MSSQL table that contains lots of duplicate records, is there a simple way to keep only one and remove all other duplicates?
  1. mike M 60
  2. mike M 60
  3. mike M 60

I need only one record
  1. mike M 60
Reply With Quote Quick reply to this message  
Join Date: Apr 2008
Posts: 296
Reputation: tesuji is on a distinguished road 
Solved Threads: 42
tesuji tesuji is offline Offline
Posting Whiz in Training

Re: How to remove duplicate records

 
0
  #2
Jun 26th, 2008
Hello michael123

I am afraid, there is no direct way to get rid of duplicate rows.

But you can populate a new table with distinct rows from your given table only, for example:

  1. -- create your table
  2. CREATE TABLE dudel(name VARCHAR(50)NOT NULL, sex CHAR(1), age INTEGER);
  3.  
  4. -- inserte some rows into dudel
  5. INSERT INTO dudel (name, sex, age) VALUES ('Randy', 'M', 66);
  6. -- . . .
  7. COMMIT;
  8.  
  9. -- Show all rows of dudel
  10. SELECT name, sex, age FROM dudel;
  11. /*
  12. name,sex,age
  13. ----------------
  14. 'mike','M',60
  15. 'mike','M',60
  16. 'mike','M',60
  17. 'Randy','M',66
  18. 'Randy','M',66
  19. 'Randy','M',66
  20. 'Randy','M',66
  21. */
  22.  
  23. -- show only distinct rows
  24. SELECT name, sex, age FROM dudel GROUP BY name, sex, age;
  25. /*
  26. name,sex,age
  27. ------------
  28. 'mike','M',60
  29. 'Randy','M',66
  30. */
  31.  
  32. -- create a new table dudel2 (clone of dudel)
  33. CREATE TABLE dudel2(name VARCHAR(50)NOT NULL, sex CHAR(1), age INTEGER);
  34.  
  35. -- populate dudel2 from dudel leaving out all duplicates
  36. INSERT INTO dudel2 SELECT name, sex, age FROM dudel GROUP BY name, sex, age;
  37.  
  38. -- show the result
  39. SELECT name, sex, age FROM dudel2;
  40. /*
  41. name,sex,age
  42. 'mike','M',60
  43. 'Randy','M',66
  44. */
Now you should be able to replace old table by new one containing distinct rows only.

You have to consider foreign keys carefully!

To supply you with more specific solution I need further details on the table in question, especially primary and foreign keys, constraints etc.

krs,
tesu
Information is moving—you know, nightly news is one way, of course, but it's also moving through the blogosphere and through the Internets. I promise you I will listen to what has been said here, even though I wasn't here. Ann and I will carry out this equivocal message to the world. I'm the master of low expectations.
Reply With Quote Quick reply to this message  
Join Date: Feb 2008
Posts: 72
Reputation: cmhampton is an unknown quantity at this point 
Solved Threads: 10
cmhampton's Avatar
cmhampton cmhampton is offline Offline
Junior Poster in Training

Re: How to remove duplicate records

 
0
  #3
Jun 29th, 2008
What version of MSSQL are you using? If 2005, then I have a query you can modify that will remove duplicate entries.
Reply With Quote Quick reply to this message  
Join Date: Jun 2005
Posts: 92
Reputation: michael123 is an unknown quantity at this point 
Solved Threads: 0
michael123 michael123 is offline Offline
Junior Poster in Training

Re: How to remove duplicate records

 
0
  #4
Jun 30th, 2008
I use MSSQL 2000 version.
Thank you tesuji, here is the table I need to remove duplicate records.
Table "addonitems":
partid componentid quantity revision componentrevision note
1001 406 1 01 01 n/a
1001 406 1 01 02 n/a
1002 501 1 01 01 n/a
1002 501 1 01 01 n/a
1002 501 1 01 02 n/a
1002 501 1 01 02 message from mike
....
....

on your code line 36, do you mean:
INSERT INTO dudel2 SELECT DISTINCT name, sex, age FROM dudel GROUP BY name, sex, age;
did you miss word "distinct"?
Reply With Quote Quick reply to this message  
Join Date: Apr 2008
Posts: 296
Reputation: tesuji is on a distinguished road 
Solved Threads: 42
tesuji tesuji is offline Offline
Posting Whiz in Training

Re: How to remove duplicate records

 
0
  #5
Jun 30th, 2008
In

INSERT INTO dudel2 SELECT name, sex, age FROM dudel GROUP BY name, sex, age;

GROUP BY clause, which must contain all attributes (column names), provides for distinct rows too.

Instead of GROUP BY clause you can also use DISTINCT:

INSERT INTO dudel2 SELECT DISTINCT name, sex, age FROM dudel;

Both insert statements are equivalent but yours is simpler

We shall wait for cmhampton to see his solution for sql server 2005. Anyway, if you are working with server 2000, he should post his solution.

krs,
tesu
Information is moving—you know, nightly news is one way, of course, but it's also moving through the blogosphere and through the Internets. I promise you I will listen to what has been said here, even though I wasn't here. Ann and I will carry out this equivocal message to the world. I'm the master of low expectations.
Reply With Quote Quick reply to this message  
Join Date: Feb 2008
Posts: 72
Reputation: cmhampton is an unknown quantity at this point 
Solved Threads: 10
cmhampton's Avatar
cmhampton cmhampton is offline Offline
Junior Poster in Training

Re: How to remove duplicate records

 
0
  #6
Jun 30th, 2008
The script is on a different computer. I'll post it when I get home.
Reply With Quote Quick reply to this message  
Join Date: Feb 2008
Posts: 72
Reputation: cmhampton is an unknown quantity at this point 
Solved Threads: 10
cmhampton's Avatar
cmhampton cmhampton is offline Offline
Junior Poster in Training

Re: How to remove duplicate records

 
1
  #7
Jul 2nd, 2008
Sorry I didn't get this uploaded the other night, the kids have been sick.

Suppose you have a table with the following structure:

  1. id - int PrimaryKey
  2. name - varchar(50)
  3. description - varchar(MAX)

and the following values:

  1. 1 Joe Short FOR Joseph
  2. 2 Dave Short FOR David
  3. 3 Joe Short FOR Joseph
  4. 4 Joe Short FOR Joseph
  5. 5 Chris Short FOR Christian
  6. 6 Rob Short FOR Robert

Notice that "Joe - Short for Joseph" has three duplicate records. It is true that we can use SELECT DISTINCT to filter these, and for a simple table like this, that's probably the best option. However, sometimes SELECT DISTINCT gets a little hairy when dealing with joins, at least in my experience. So, without having the time to create a complex data structure, or using one I already have that contains confidential data, let's use this simple example.

MSSQL 2005 added a handy new function called ROW_NUMBER(). Learn it, love it (lol). Seriously though, it will make your life easier. What this function does is allow you to get the row number of a record in a returned data table. On the surface, this doesn't sound like much. But, it becomes extremely useful when you realize that you can partition, or group, the records. Let use this on the table shown above:

  1. WITH names AS
  2. (
  3. SELECT
  4. id,
  5. name,
  6. description,
  7. ROW_NUMBER() OVER(PARTITION BY name, description ORDER BY id) AS rowNum
  8. FROM
  9. table_1
  10. )
  11.  
  12. SELECT
  13. id,
  14. name,
  15. description,
  16. rowNum
  17. FROM
  18. names

We get these results:

  1. id name description rowNum
  2.  
  3. 5 Chris Short FOR Christian 1
  4. 2 Dave Short FOR David 1
  5. 1 Joe Short FOR Joseph 1
  6. 3 Joe Short FOR Joseph 2
  7. 4 Joe Short FOR Joseph 3
  8. 6 Rob Short FOR Robert 1

Notice that now we have rowNumbers for each record, and they are partitioned by the name and description fields. So, if we want to get the duplicate records for this table, we add a WHERE clause to the query:

  1. SELECT
  2. id,
  3. name,
  4. description
  5. FROM
  6. names
  7. WHERE
  8. rowNum > 1

which gives us:

  1. 3 Joe Short FOR Joseph 2
  2. 4 Joe Short FOR Joseph 3

Cool isn't it?

What this allows us to do, is dump this into a temporary table to use as in a subquery...

  1. DECLARE @tblNames TABLE(id INT);
  2.  
  3. WITH names AS
  4. (
  5. SELECT
  6. id,
  7. name,
  8. description,
  9. ROW_NUMBER() OVER(PARTITION BY name, description ORDER BY id) AS rowNum
  10. FROM
  11. table_1
  12. )
  13. INSERT
  14. @tblNames
  15. SELECT
  16. id
  17. FROM
  18. names
  19. WHERE
  20. rowNum > 1
  21.  
  22. DELETE
  23. FROM
  24. table_1
  25. WHERE
  26. id IN
  27. (
  28. SELECT
  29. id
  30. FROM
  31. @tblNames
  32. )

Now if we select the records from table_1, the duplicates have been eliminated. Here's the results:

  1. 1 Joe Short FOR Joseph
  2. 2 Dave Short FOR David
  3. 5 Chris Short FOR Christian
  4. 6 Rob Short FOR Robert

There's a way to do this without the temp table, but I can't find my original query, so I had to do it from memory.

Sorry to turn this into a novel, but I thought an explanation would be better than a code dump.
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:




Views: 3985 | Replies: 6
Thread Tools Search this Thread



Tag cloud for MS SQL
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC