0

Hi all,

I'm looking into the management of duplicate records in MySQL, but this case is a little bit more specific. A record is only duplicated if column A and column B have the same value. So a record is only a duplicate if two specific columns have the same values in more than one record.

Table (excerpt):

| id | userid | courseid | 
|  1 |     23 |        3 |
|  2 |     23 |        3 |
|  3 |     28 |        5 |
|  4 |     28 |        5 |
|  5 |     23 |        5 |
|  6 |     23 |        3 |
|  7 |     29 |        3 |

Duplicate rows above that would be deleted by the query = id's 2, 4 and 6 (as I would like to keep one of the duplicate records, in this case, the record with the lowest id value.

Figuring out a solution:

After playing around and doing a little bit of research, this seemed to work a treat:

ALTER IGNORE TABLE table ADD UNIQUE INDEX tmpindex (userid,courseid);
ALTER TABLE table DROP INDEX tmpindex;

My only concern is that this probably isn't the fastest or safest method of achieving the same result, this query could be executed dozens of times an hour or thousands of times by a rouge user (I'll add a control/restriction eventually). Does anyone have any feedback or suggestions? I would truly appreciate a bit more wisdom.

Another thought, would it be more sensible to check for an existing record before inserting duplicates in the first place? There could be 100's of inserts from one submission so a foreach to check if a record exists before inserting it just wouldn't be sensible imo - but I'm no DBA guru...

Thanks in advance :)

Michael

Edited by mmcdonald

2
Contributors
4
Replies
24
Views
3 Years
Discussion Span
Last Post by mmcdonald
2

I'd suggest that you make a compound PK out of the two columns and not have an 'id' field. But as you're already on this path,

DELETE t1 FROM tablename t1, tablename t2 WHERE t1.id > t2.id AND t1.userid = t2.userid AND t1.courseid = t2.courseid

Not tested!

Before testing - ensure that you create a duplicate table and do your testing on that!

Edited by diafol

0

I'll be honest and say this is the first time I've head of a compound key... I'm open to suggestions! (The id column is easily deleted).

DELETE t1 FROM tablename t1, tablename t2 WHERE t1.id > t2.id AND t1.userid = t2.userid AND t1.courseid = t2.courseid

Is it fair to say that the use of t1 and t2 here has totally confused me :/

Sorry for derping Diafol and thanks :)

0

Ok well the delete query I offered will only work with the id field - as I thought that this is what you had. I suggested the compound key as this is the easier alternative to stopping duplicate entries in the first place.

If you try to add a duplicate entry it will fail as primary keys are unique (as you know).

t1 and t2 are just aliases since the query relies upon the fact that we have to place the same tablename in twice. Without the aliases the sql would be ambiguous to say the least, as you can't compare something to its 'direct self'.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.