943,682 Members | Top Members by Rank

Ad:
  • MS SQL Discussion Thread
  • Unsolved
  • Views: 6756
  • MS SQL RSS
Jun 26th, 2008
0

How to remove duplicate records

Expand Post »
I have a MSSQL table that contains lots of duplicate records, is there a simple way to keep only one and remove all other duplicates?
MS SQL Syntax (Toggle Plain Text)
  1. mike M 60
  2. mike M 60
  3. mike M 60

I need only one record
MS SQL Syntax (Toggle Plain Text)
  1. mike M 60
Similar Threads
Reputation Points: 10
Solved Threads: 0
Junior Poster in Training
michael123 is offline Offline
93 posts
since Jun 2005
Jun 26th, 2008
0

Re: How to remove duplicate records

Hello michael123

I am afraid, there is no direct way to get rid of duplicate rows.

But you can populate a new table with distinct rows from your given table only, for example:

sql Syntax (Toggle Plain Text)
  1. -- create your table
  2. CREATE TABLE dudel(name VARCHAR(50)NOT NULL, sex CHAR(1), age INTEGER);
  3.  
  4. -- inserte some rows into dudel
  5. INSERT INTO dudel (name, sex, age) VALUES ('Randy', 'M', 66);
  6. -- . . .
  7. COMMIT;
  8.  
  9. -- Show all rows of dudel
  10. SELECT name, sex, age FROM dudel;
  11. /*
  12. name,sex,age
  13. ----------------
  14. 'mike','M',60
  15. 'mike','M',60
  16. 'mike','M',60
  17. 'Randy','M',66
  18. 'Randy','M',66
  19. 'Randy','M',66
  20. 'Randy','M',66
  21. */
  22.  
  23. -- show only distinct rows
  24. SELECT name, sex, age FROM dudel GROUP BY name, sex, age;
  25. /*
  26. name,sex,age
  27. ------------
  28. 'mike','M',60
  29. 'Randy','M',66
  30. */
  31.  
  32. -- create a new table dudel2 (clone of dudel)
  33. CREATE TABLE dudel2(name VARCHAR(50)NOT NULL, sex CHAR(1), age INTEGER);
  34.  
  35. -- populate dudel2 from dudel leaving out all duplicates
  36. INSERT INTO dudel2 SELECT name, sex, age FROM dudel GROUP BY name, sex, age;
  37.  
  38. -- show the result
  39. SELECT name, sex, age FROM dudel2;
  40. /*
  41. name,sex,age
  42. 'mike','M',60
  43. 'Randy','M',66
  44. */
Now you should be able to replace old table by new one containing distinct rows only.

You have to consider foreign keys carefully!

To supply you with more specific solution I need further details on the table in question, especially primary and foreign keys, constraints etc.

krs,
tesu
Reputation Points: 158
Solved Threads: 98
Master Poster
tesuji is offline Offline
720 posts
since Apr 2008
Jun 29th, 2008
0

Re: How to remove duplicate records

What version of MSSQL are you using? If 2005, then I have a query you can modify that will remove duplicate entries.
Reputation Points: 23
Solved Threads: 10
Junior Poster in Training
cmhampton is offline Offline
79 posts
since Feb 2008
Jun 30th, 2008
0

Re: How to remove duplicate records

I use MSSQL 2000 version.
Thank you tesuji, here is the table I need to remove duplicate records.
Table "addonitems":
partid componentid quantity revision componentrevision note
1001 406 1 01 01 n/a
1001 406 1 01 02 n/a
1002 501 1 01 01 n/a
1002 501 1 01 01 n/a
1002 501 1 01 02 n/a
1002 501 1 01 02 message from mike
....
....

on your code line 36, do you mean:
INSERT INTO dudel2 SELECT DISTINCT name, sex, age FROM dudel GROUP BY name, sex, age;
did you miss word "distinct"?
Reputation Points: 10
Solved Threads: 0
Junior Poster in Training
michael123 is offline Offline
93 posts
since Jun 2005
Jun 30th, 2008
0

Re: How to remove duplicate records

In

INSERT INTO dudel2 SELECT name, sex, age FROM dudel GROUP BY name, sex, age;

GROUP BY clause, which must contain all attributes (column names), provides for distinct rows too.

Instead of GROUP BY clause you can also use DISTINCT:

INSERT INTO dudel2 SELECT DISTINCT name, sex, age FROM dudel;

Both insert statements are equivalent but yours is simpler

We shall wait for cmhampton to see his solution for sql server 2005. Anyway, if you are working with server 2000, he should post his solution.

krs,
tesu
Reputation Points: 158
Solved Threads: 98
Master Poster
tesuji is offline Offline
720 posts
since Apr 2008
Jun 30th, 2008
0

Re: How to remove duplicate records

The script is on a different computer. I'll post it when I get home.
Reputation Points: 23
Solved Threads: 10
Junior Poster in Training
cmhampton is offline Offline
79 posts
since Feb 2008
Jul 2nd, 2008
1

Re: How to remove duplicate records

Sorry I didn't get this uploaded the other night, the kids have been sick.

Suppose you have a table with the following structure:

MS SQL Syntax (Toggle Plain Text)
  1. id - int PrimaryKey
  2. name - varchar(50)
  3. description - varchar(MAX)

and the following values:

MS SQL Syntax (Toggle Plain Text)
  1. 1 Joe Short FOR Joseph
  2. 2 Dave Short FOR David
  3. 3 Joe Short FOR Joseph
  4. 4 Joe Short FOR Joseph
  5. 5 Chris Short FOR Christian
  6. 6 Rob Short FOR Robert

Notice that "Joe - Short for Joseph" has three duplicate records. It is true that we can use SELECT DISTINCT to filter these, and for a simple table like this, that's probably the best option. However, sometimes SELECT DISTINCT gets a little hairy when dealing with joins, at least in my experience. So, without having the time to create a complex data structure, or using one I already have that contains confidential data, let's use this simple example.

MSSQL 2005 added a handy new function called ROW_NUMBER(). Learn it, love it (lol). Seriously though, it will make your life easier. What this function does is allow you to get the row number of a record in a returned data table. On the surface, this doesn't sound like much. But, it becomes extremely useful when you realize that you can partition, or group, the records. Let use this on the table shown above:

SQL Syntax (Toggle Plain Text)
  1. WITH names AS
  2. (
  3. SELECT
  4. id,
  5. name,
  6. description,
  7. ROW_NUMBER() OVER(PARTITION BY name, description ORDER BY id) AS rowNum
  8. FROM
  9. table_1
  10. )
  11.  
  12. SELECT
  13. id,
  14. name,
  15. description,
  16. rowNum
  17. FROM
  18. names

We get these results:

MS SQL Syntax (Toggle Plain Text)
  1. id name description rowNum
  2.  
  3. 5 Chris Short FOR Christian 1
  4. 2 Dave Short FOR David 1
  5. 1 Joe Short FOR Joseph 1
  6. 3 Joe Short FOR Joseph 2
  7. 4 Joe Short FOR Joseph 3
  8. 6 Rob Short FOR Robert 1

Notice that now we have rowNumbers for each record, and they are partitioned by the name and description fields. So, if we want to get the duplicate records for this table, we add a WHERE clause to the query:

MS SQL Syntax (Toggle Plain Text)
  1. SELECT
  2. id,
  3. name,
  4. description
  5. FROM
  6. names
  7. WHERE
  8. rowNum > 1

which gives us:

MS SQL Syntax (Toggle Plain Text)
  1. 3 Joe Short FOR Joseph 2
  2. 4 Joe Short FOR Joseph 3

Cool isn't it?

What this allows us to do, is dump this into a temporary table to use as in a subquery...

SQL Syntax (Toggle Plain Text)
  1. DECLARE @tblNames TABLE(id INT);
  2.  
  3. WITH names AS
  4. (
  5. SELECT
  6. id,
  7. name,
  8. description,
  9. ROW_NUMBER() OVER(PARTITION BY name, description ORDER BY id) AS rowNum
  10. FROM
  11. table_1
  12. )
  13. INSERT
  14. @tblNames
  15. SELECT
  16. id
  17. FROM
  18. names
  19. WHERE
  20. rowNum > 1
  21.  
  22. DELETE
  23. FROM
  24. table_1
  25. WHERE
  26. id IN
  27. (
  28. SELECT
  29. id
  30. FROM
  31. @tblNames
  32. )

Now if we select the records from table_1, the duplicates have been eliminated. Here's the results:

MS SQL Syntax (Toggle Plain Text)
  1. 1 Joe Short FOR Joseph
  2. 2 Dave Short FOR David
  3. 5 Chris Short FOR Christian
  4. 6 Rob Short FOR Robert

There's a way to do this without the temp table, but I can't find my original query, so I had to do it from memory.

Sorry to turn this into a novel, but I thought an explanation would be better than a code dump.
Reputation Points: 23
Solved Threads: 10
Junior Poster in Training
cmhampton is offline Offline
79 posts
since Feb 2008

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in MS SQL Forum Timeline: Max value in a decimal field
Next Thread in MS SQL Forum Timeline: Reverse lookup





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC