1,105,456 Community Members

How to remove duplicate records

Member Avatar
michael123
Junior Poster in Training
94 posts since Jun 2005
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

I have a MSSQL table that contains lots of duplicate records, is there a simple way to keep only one and remove all other duplicates?

mike M 60 
mike M 60
mike M 60

I need only one record

mike M 60
Member Avatar
tesuji
Master Poster
721 posts since Apr 2008
Reputation Points: 135 [?]
Q&As Helped to Solve: 98 [?]
Skill Endorsements: 0 [?]
 
0
 

Hello michael123

I am afraid, there is no direct way to get rid of duplicate rows.

But you can populate a new table with distinct rows from your given table only, for example:

-- create your table
create table dudel(name varchar(50)not null, sex char(1), age integer);

-- inserte some rows into dudel
insert into dudel (name, sex, age) values ('Randy', 'M', 66);
-- . . .
commit;

-- Show all rows of dudel
select name, sex, age from dudel;
/*
name,sex,age
----------------
'mike','M',60
'mike','M',60
'mike','M',60
'Randy','M',66
'Randy','M',66
'Randy','M',66
'Randy','M',66
*/

-- show only distinct rows
select name, sex, age from dudel group by name, sex, age;
/*
name,sex,age
------------
'mike','M',60
'Randy','M',66
*/

-- create a new table dudel2 (clone of dudel)
create table dudel2(name varchar(50)not null, sex char(1), age integer);

-- populate dudel2 from dudel leaving out all duplicates
insert into dudel2 select name, sex, age  from dudel group by name, sex, age;

-- show the result
select name, sex, age from dudel2; 
/*
name,sex,age
'mike','M',60
'Randy','M',66
*/

Now you should be able to replace old table by new one containing distinct rows only.

You have to consider foreign keys carefully!

To supply you with more specific solution I need further details on the table in question, especially primary and foreign keys, constraints etc.

krs,
tesu

Member Avatar
cmhampton
Junior Poster in Training
83 posts since Feb 2008
Reputation Points: 8 [?]
Q&As Helped to Solve: 10 [?]
Skill Endorsements: 0 [?]
 
0
 

What version of MSSQL are you using? If 2005, then I have a query you can modify that will remove duplicate entries.

Member Avatar
michael123
Junior Poster in Training
94 posts since Jun 2005
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

I use MSSQL 2000 version.
Thank you tesuji, here is the table I need to remove duplicate records.
Table "addonitems":
partid componentid quantity revision componentrevision note
1001 406 1 01 01 n/a
1001 406 1 01 02 n/a
1002 501 1 01 01 n/a
1002 501 1 01 01 n/a
1002 501 1 01 02 n/a
1002 501 1 01 02 message from mike
....
....

on your code line 36, do you mean:
INSERT INTO dudel2 SELECT DISTINCT name, sex, age FROM dudel GROUP BY name, sex, age;
did you miss word "distinct"?

Member Avatar
tesuji
Master Poster
721 posts since Apr 2008
Reputation Points: 135 [?]
Q&As Helped to Solve: 98 [?]
Skill Endorsements: 0 [?]
 
0
 

In

INSERT INTO dudel2 SELECT name, sex, age FROM dudel GROUP BY name, sex, age;

GROUP BY clause, which must contain all attributes (column names), provides for distinct rows too.

Instead of GROUP BY clause you can also use DISTINCT:

INSERT INTO dudel2 SELECT DISTINCT name, sex, age FROM dudel;

Both insert statements are equivalent but yours is simpler :)

We shall wait for cmhampton to see his solution for sql server 2005. Anyway, if you are working with server 2000, he should post his solution.

krs,
tesu

Member Avatar
cmhampton
Junior Poster in Training
83 posts since Feb 2008
Reputation Points: 8 [?]
Q&As Helped to Solve: 10 [?]
Skill Endorsements: 0 [?]
 
0
 

The script is on a different computer. I'll post it when I get home.

Member Avatar
cmhampton
Junior Poster in Training
83 posts since Feb 2008
Reputation Points: 8 [?]
Q&As Helped to Solve: 10 [?]
Skill Endorsements: 0 [?]
 
1
 

Sorry I didn't get this uploaded the other night, the kids have been sick.

Suppose you have a table with the following structure:

id - int PrimaryKey
name - varchar(50)
description - varchar(MAX)

and the following values:

1	Joe	  Short for Joseph
2	Dave	  Short for David
3	Joe	  Short for Joseph
4	Joe	  Short for Joseph
5	Chris	  Short for Christian
6	Rob	  Short for Robert

Notice that "Joe - Short for Joseph" has three duplicate records. It is true that we can use SELECT DISTINCT to filter these, and for a simple table like this, that's probably the best option. However, sometimes SELECT DISTINCT gets a little hairy when dealing with joins, at least in my experience. So, without having the time to create a complex data structure, or using one I already have that contains confidential data, let's use this simple example.

MSSQL 2005 added a handy new function called ROW_NUMBER(). Learn it, love it (lol). Seriously though, it will make your life easier. What this function does is allow you to get the row number of a record in a returned data table. On the surface, this doesn't sound like much. But, it becomes extremely useful when you realize that you can partition, or group, the records. Let use this on the table shown above:

WITH names AS 
(
SELECT
		id,
		name,
		description,
		ROW_NUMBER() OVER(PARTITION BY name, description ORDER BY id) AS rowNum
FROM
		table_1
)

SELECT
		id,
		name,
		description,
                rowNum
FROM
		names

We get these results:

id     name  description               rowNum

5	Chris	  Short for Christian	1
2	Dave	  Short for David	1
1	Joe	  Short for Joseph	1
3	Joe	  Short for Joseph	2
4	Joe	  Short for Joseph	3
6	Rob	  Short for Robert	1

Notice that now we have rowNumbers for each record, and they are partitioned by the name and description fields. So, if we want to get the duplicate records for this table, we add a WHERE clause to the query:

SELECT
		id,
		name,
		description
FROM
		names
WHERE
		rowNum > 1

which gives us:

3	Joe	Short for Joseph	2
4	Joe	Short for Joseph	3

Cool isn't it?

What this allows us to do, is dump this into a temporary table to use as in a subquery...

DECLARE @tblNames TABLE(id int);

WITH names AS 
(
SELECT
		id,
		name,
		description,
		ROW_NUMBER() OVER(PARTITION BY name, description ORDER BY id) AS rowNum
FROM
		table_1
)
INSERT
		@tblNames
SELECT
		id
FROM
		names
WHERE
		rowNum > 1

DELETE
FROM
	table_1
WHERE
	id IN
	(
		SELECT
				id
		FROM
				@tblNames
	)

Now if we select the records from table_1, the duplicates have been eliminated. Here's the results:

1	Joe	Short for Joseph
2	Dave	Short for David
5	Chris	Short for Christian
6	Rob	Short for Robert

There's a way to do this without the temp table, but I can't find my original query, so I had to do it from memory.

Sorry to turn this into a novel, but I thought an explanation would be better than a code dump.

You
This article has been dead for over three months: Start a new discussion instead
Post:
Start New Discussion
Tags Related to this Article