| | |
How to remove duplicate records
Please support our MS SQL advertiser: Intel Parallel Studio Home
![]() |
•
•
Join Date: Jun 2005
Posts: 92
Reputation:
Solved Threads: 0
I have a MSSQL table that contains lots of duplicate records, is there a simple way to keep only one and remove all other duplicates?
I need only one record
MS SQL Syntax (Toggle Plain Text)
mike M 60 mike M 60 mike M 60
I need only one record
MS SQL Syntax (Toggle Plain Text)
mike M 60
•
•
Join Date: Apr 2008
Posts: 296
Reputation:
Solved Threads: 42
Hello michael123
I am afraid, there is no direct way to get rid of duplicate rows.
But you can populate a new table with distinct rows from your given table only, for example:
Now you should be able to replace old table by new one containing distinct rows only.
You have to consider foreign keys carefully!
To supply you with more specific solution I need further details on the table in question, especially primary and foreign keys, constraints etc.
krs,
tesu
I am afraid, there is no direct way to get rid of duplicate rows.
But you can populate a new table with distinct rows from your given table only, for example:
sql Syntax (Toggle Plain Text)
-- create your table CREATE TABLE dudel(name VARCHAR(50)NOT NULL, sex CHAR(1), age INTEGER); -- inserte some rows into dudel INSERT INTO dudel (name, sex, age) VALUES ('Randy', 'M', 66); -- . . . COMMIT; -- Show all rows of dudel SELECT name, sex, age FROM dudel; /* name,sex,age ---------------- 'mike','M',60 'mike','M',60 'mike','M',60 'Randy','M',66 'Randy','M',66 'Randy','M',66 'Randy','M',66 */ -- show only distinct rows SELECT name, sex, age FROM dudel GROUP BY name, sex, age; /* name,sex,age ------------ 'mike','M',60 'Randy','M',66 */ -- create a new table dudel2 (clone of dudel) CREATE TABLE dudel2(name VARCHAR(50)NOT NULL, sex CHAR(1), age INTEGER); -- populate dudel2 from dudel leaving out all duplicates INSERT INTO dudel2 SELECT name, sex, age FROM dudel GROUP BY name, sex, age; -- show the result SELECT name, sex, age FROM dudel2; /* name,sex,age 'mike','M',60 'Randy','M',66 */
You have to consider foreign keys carefully!
To supply you with more specific solution I need further details on the table in question, especially primary and foreign keys, constraints etc.
krs,
tesu
Information is moving—you know, nightly news is one way, of course, but it's also moving through the blogosphere and through the Internets. I promise you I will listen to what has been said here, even though I wasn't here. Ann and I will carry out this equivocal message to the world. I'm the master of low expectations.
•
•
Join Date: Jun 2005
Posts: 92
Reputation:
Solved Threads: 0
I use MSSQL 2000 version.
Thank you tesuji, here is the table I need to remove duplicate records.
Table "addonitems":
partid componentid quantity revision componentrevision note
1001 406 1 01 01 n/a
1001 406 1 01 02 n/a
1002 501 1 01 01 n/a
1002 501 1 01 01 n/a
1002 501 1 01 02 n/a
1002 501 1 01 02 message from mike
....
....
on your code line 36, do you mean:
INSERT INTO dudel2 SELECT DISTINCT name, sex, age FROM dudel GROUP BY name, sex, age;
did you miss word "distinct"?
Thank you tesuji, here is the table I need to remove duplicate records.
Table "addonitems":
partid componentid quantity revision componentrevision note
1001 406 1 01 01 n/a
1001 406 1 01 02 n/a
1002 501 1 01 01 n/a
1002 501 1 01 01 n/a
1002 501 1 01 02 n/a
1002 501 1 01 02 message from mike
....
....
on your code line 36, do you mean:
INSERT INTO dudel2 SELECT DISTINCT name, sex, age FROM dudel GROUP BY name, sex, age;
did you miss word "distinct"?
•
•
Join Date: Apr 2008
Posts: 296
Reputation:
Solved Threads: 42
In
INSERT INTO dudel2 SELECT name, sex, age FROM dudel GROUP BY name, sex, age;
GROUP BY clause, which must contain all attributes (column names), provides for distinct rows too.
Instead of GROUP BY clause you can also use DISTINCT:
INSERT INTO dudel2 SELECT DISTINCT name, sex, age FROM dudel;
Both insert statements are equivalent but yours is simpler
We shall wait for cmhampton to see his solution for sql server 2005. Anyway, if you are working with server 2000, he should post his solution.
krs,
tesu
INSERT INTO dudel2 SELECT name, sex, age FROM dudel GROUP BY name, sex, age;
GROUP BY clause, which must contain all attributes (column names), provides for distinct rows too.
Instead of GROUP BY clause you can also use DISTINCT:
INSERT INTO dudel2 SELECT DISTINCT name, sex, age FROM dudel;
Both insert statements are equivalent but yours is simpler
We shall wait for cmhampton to see his solution for sql server 2005. Anyway, if you are working with server 2000, he should post his solution.
krs,
tesu
Information is moving—you know, nightly news is one way, of course, but it's also moving through the blogosphere and through the Internets. I promise you I will listen to what has been said here, even though I wasn't here. Ann and I will carry out this equivocal message to the world. I'm the master of low expectations.
Sorry I didn't get this uploaded the other night, the kids have been sick.
Suppose you have a table with the following structure:
and the following values:
Notice that "Joe - Short for Joseph" has three duplicate records. It is true that we can use SELECT DISTINCT to filter these, and for a simple table like this, that's probably the best option. However, sometimes SELECT DISTINCT gets a little hairy when dealing with joins, at least in my experience. So, without having the time to create a complex data structure, or using one I already have that contains confidential data, let's use this simple example.
MSSQL 2005 added a handy new function called ROW_NUMBER(). Learn it, love it (lol). Seriously though, it will make your life easier. What this function does is allow you to get the row number of a record in a returned data table. On the surface, this doesn't sound like much. But, it becomes extremely useful when you realize that you can partition, or group, the records. Let use this on the table shown above:
We get these results:
Notice that now we have rowNumbers for each record, and they are partitioned by the name and description fields. So, if we want to get the duplicate records for this table, we add a WHERE clause to the query:
which gives us:
Cool isn't it?
What this allows us to do, is dump this into a temporary table to use as in a subquery...
Now if we select the records from table_1, the duplicates have been eliminated. Here's the results:
There's a way to do this without the temp table, but I can't find my original query, so I had to do it from memory.
Sorry to turn this into a novel, but I thought an explanation would be better than a code dump.
Suppose you have a table with the following structure:
MS SQL Syntax (Toggle Plain Text)
id - int PrimaryKey name - varchar(50) description - varchar(MAX)
and the following values:
MS SQL Syntax (Toggle Plain Text)
1 Joe Short FOR Joseph 2 Dave Short FOR David 3 Joe Short FOR Joseph 4 Joe Short FOR Joseph 5 Chris Short FOR Christian 6 Rob Short FOR Robert
Notice that "Joe - Short for Joseph" has three duplicate records. It is true that we can use SELECT DISTINCT to filter these, and for a simple table like this, that's probably the best option. However, sometimes SELECT DISTINCT gets a little hairy when dealing with joins, at least in my experience. So, without having the time to create a complex data structure, or using one I already have that contains confidential data, let's use this simple example.
MSSQL 2005 added a handy new function called ROW_NUMBER(). Learn it, love it (lol). Seriously though, it will make your life easier. What this function does is allow you to get the row number of a record in a returned data table. On the surface, this doesn't sound like much. But, it becomes extremely useful when you realize that you can partition, or group, the records. Let use this on the table shown above:
SQL Syntax (Toggle Plain Text)
WITH names AS ( SELECT id, name, description, ROW_NUMBER() OVER(PARTITION BY name, description ORDER BY id) AS rowNum FROM table_1 ) SELECT id, name, description, rowNum FROM names
We get these results:
MS SQL Syntax (Toggle Plain Text)
id name description rowNum 5 Chris Short FOR Christian 1 2 Dave Short FOR David 1 1 Joe Short FOR Joseph 1 3 Joe Short FOR Joseph 2 4 Joe Short FOR Joseph 3 6 Rob Short FOR Robert 1
Notice that now we have rowNumbers for each record, and they are partitioned by the name and description fields. So, if we want to get the duplicate records for this table, we add a WHERE clause to the query:
MS SQL Syntax (Toggle Plain Text)
SELECT id, name, description FROM names WHERE rowNum > 1
which gives us:
MS SQL Syntax (Toggle Plain Text)
3 Joe Short FOR Joseph 2 4 Joe Short FOR Joseph 3
Cool isn't it?
What this allows us to do, is dump this into a temporary table to use as in a subquery...
SQL Syntax (Toggle Plain Text)
DECLARE @tblNames TABLE(id INT); WITH names AS ( SELECT id, name, description, ROW_NUMBER() OVER(PARTITION BY name, description ORDER BY id) AS rowNum FROM table_1 ) INSERT @tblNames SELECT id FROM names WHERE rowNum > 1 DELETE FROM table_1 WHERE id IN ( SELECT id FROM @tblNames )
Now if we select the records from table_1, the duplicates have been eliminated. Here's the results:
MS SQL Syntax (Toggle Plain Text)
1 Joe Short FOR Joseph 2 Dave Short FOR David 5 Chris Short FOR Christian 6 Rob Short FOR Robert
There's a way to do this without the temp table, but I can't find my original query, so I had to do it from memory.
Sorry to turn this into a novel, but I thought an explanation would be better than a code dump.
![]() |
Similar Threads
- how to delete duplicate record in a table by using SQL query (MS SQL)
- csv file duplicate removal (Perl)
- Event Gateway Assistance (ColdFusion)
- simultaneous MS Access Database access problem using VB6 program. (Visual Basic 4 / 5 / 6)
- Site in Google's Supplemental index (Search Engine Optimization)
- help to debug my (C++)
- help to debug my code (C++)
- Php newsletter error (PHP)
Other Threads in the MS SQL Forum
- Previous Thread: Max value in a decimal field
- Next Thread: Reverse lookup
| Thread Tools | Search this Thread |
Tag cloud for MS SQL
"last autogrowth business connectingtodatabaseinuse count cursor data database dateadd datepart day" dbsize deadlock delete_trigger exploit getdate hack highperformancecomputing hpc hpcserver2008 ibm iis limit loop maximum microsoft ms mssql multiple multithreading news number password permission position query reporting result security server services sets single source sql sql-injection sqlserver sqlserver2005 subtype supercomputing supertype tables uniqueid update view weekday





