DaniWeb IT Discussion Community

DaniWeb IT Discussion Community (http://www.daniweb.com/forums/index.php)
-   MS SQL (http://www.daniweb.com/forums/forum127.html)
-   -   Delete records from a table (http://www.daniweb.com/forums/thread133586.html)

ssprastogi Jul 10th, 2008 3:46 am
Delete records from a table
 
how to delete duplicate records from table.

aboyd Jul 10th, 2008 4:09 am
Re: Delete records from a table
 
DELETE FROM table1
USING table1, table1 AS vtable
WHERE (table1.ID > vtable.ID)
AND (table1.field_name=vtable.field_name)

You may need to have lots of ANDs at the end, if you want lots of fields to be the same. If you just want to delete records that have ONE field the same, then my example works already.

As an example, let's adapt the SQL above to remove records from a table called macintosh. The table has many columns, but we're going to call a record a duplicate if three (of the many) columns match. Here we go:

DELETE FROM macintosh
USING macintosh, macintosh AS m2
WHERE (macintosh.id > m2.id)
AND (macintosh.manufacturer=m2.manufacturer)
AND (macintosh.model=m2.model)
AND (macintosh.os=m2.os)

PLEASE only try this on a copy of your database table. Do not run this on production tables until you've certified that it works.

cmhampton Jul 10th, 2008 1:34 pm
Re: Delete records from a table
 
If you are using SQL 2005, here's another option. (originally posted in http://www.daniweb.com/forums/post639423-7.html)

Suppose you have a table with the following structure:

id - int PrimaryKey
name - varchar(50)
description - varchar(MAX)

and the following values:

1        Joe          Short for Joseph
2        Dave          Short for David
3        Joe          Short for Joseph
4        Joe          Short for Joseph
5        Chris          Short for Christian
6        Rob          Short for Robert

Notice that "Joe - Short for Joseph" has three duplicate records. It is true that we can use SELECT DISTINCT to filter these, and for a simple table like this, that's probably the best option. However, sometimes SELECT DISTINCT gets a little hairy when dealing with joins, at least in my experience. So, without having the time to create a complex data structure, or using one I already have that contains confidential data, let's use this simple example.

MSSQL 2005 added a handy new function called ROW_NUMBER(). Learn it, love it (lol). Seriously though, it will make your life easier. What this function does is allow you to get the row number of a record in a returned data table. On the surface, this doesn't sound like much. But, it becomes extremely useful when you realize that you can partition, or group, the records. Let use this on the table shown above:

WITH names AS 
(
SELECT
                id,
                name,
                description,
                ROW_NUMBER() OVER(PARTITION BY name, description ORDER BY id) AS rowNum
FROM
                table_1
)

SELECT
                id,
                name,
                description,
                rowNum
FROM
                names

We get these results:

id     name  description               rowNum

5        Chris        Short for Christian    1
2        Dave        Short for David                1
1        Joe          Short for Joseph        1
3        Joe          Short for Joseph        2
4        Joe          Short for Joseph        3
6        Rob        Short for Robert        1

Notice that now we have rowNumbers for each record, and they are partitioned by the name and description fields. So, if we want to get the duplicate records for this table, we add a WHERE clause to the query:

SELECT
                id,
                name,
                description
FROM
                names
WHERE
                rowNum > 1

which gives us:

3        Joe        Short for Joseph        2
4        Joe        Short for Joseph        3

Cool isn't it?

What this allows us to do, is dump this into a temporary table to use as in a subquery...

DECLARE @tblNames TABLE(id int);

WITH names AS
(
SELECT
                id,
                name,
                description,
                ROW_NUMBER() OVER(PARTITION BY name, description ORDER BY id) AS rowNum
FROM
                table_1
)
INSERT
                @tblNames
SELECT
                id
FROM
                names
WHERE
                rowNum > 1

DELETE
FROM
        table_1
WHERE
        id IN
        (
                SELECT
                                id
                FROM
                                @tblNames
        )

Now if we select the records from table_1, the duplicates have been eliminated. Here's the results:

1        Joe          Short for Joseph
2        Dave        Short for David
5        Chris        Short for Christian
6        Rob        Short for Robert

There's a way to do this without the temp table, but I can't find my original query, so I had to do it from memory.

Sorry to turn this into a novel, but I thought an explanation would be better than a code dump.


All times are GMT -4. The time now is 7:08 pm.

Forum system based on vBulletin Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
©2003 - 2009 DaniWeb® LLC