cant Prevent Duplicate id values from entering my tables

Question

berserk 60 Junior Poster

8 Years Ago

SO Basically i have this problem with this test build of a little project im rewriting and so far its not looking to good

I have multiple tables that are all the same, they have been set up this way for simplicity as far as i can tell.

Tables

+----+----------+-----------------+---------------+-------------------+
| id | sec_id   | customer_name   | customer_date | etc...            |
+----+----------+-----------------+---------------+-------------------+
| 0  | 14328778 | Somebody Once   | 2/1/2016      | Role Me I         |
+----+----------+-----------------+---------------+-------------------+
| 1  | 14428755 | Told Me The     | 2/6/2015      | Aint The Sharpest |
+----+----------+-----------------+---------------+-------------------+
| 2  | 15548447 | World Was Gonna | 2/12/2014     | Tool in the shed  |
+----+----------+-----------------+---------------+-------------------+

As you can see from my table above i have the regular id and then this second id
the problem is in the second id number because thats where my duplicates are.

I have about 25 tables like the one above and they are all the same but for different departments and have all different data but are all basically used in the same way

I am in the process of attempting reporting/inserting data into these tables without creating any duplicates and so far i have failed miserably at this
because i have some reported 55 duplicates over the past week out of nearly 1000 entries and while this might not seem like a big deal its continueing to be a huge headache for me because i cant get the data from the tables if one of the sec_id's are similar to another tables sec_id, here is the code i have been using to prevent duplicates so far and i thought it was working but apparently it isnt

do {
        $random_number = rand(10000,10000000);
        $sql = "(SELECT sec_id FROM depart_table0 WHERE sec_id='".$random_number."')";
        for ($i=0; $i < $table_count[0]-7; $i++){
            $sql .= "UNION (SELECT sec_id FROM depart_table".$i." WHERE sec_id='".$random_number."')";
        }
        $sql_response = mysqli_query($con, $sql)or die(mysqli_error($con));
    } while (mysqli_num_rows($sql_response) == 1);

Please tell me what im doing wrong because i tested this code before (or at least i thought i thouroghly tested it) and it seemed to work before

Any help or guidance you give is helping a fresh out of college developer learn his way, please and thank you!

mysql php

Edited 8 Years Ago by Reverend Jim because: Moved to programming forum

2 Contributors
11 Replies
729 Views
6 Days Discussion Span
Latest Post 8 Years Ago Latest Post by Traevel

Traevel 216 Light Poster

8 Years Ago

I think the code of how you are creating the sec_id's would be more helpful because this seems to be a checking method only. One that keeps running until there is no longer a response on a random id (which might as well be the first number generated). Also, unless those sec_id's you posted are fake, they're all above 14.000.000 and your random test only goes up to 10.000.000.

FYI it's past 6 a.m. here so I won't be able to look at it until tomorrow if you do post the generating code (others will though!).

Traevel 216 Light Poster

8 Years Ago

would the greater than or equal to 1 solve my problem?

It would if you are really checking against all tables. In your loop I see $i < $table_count[0]-7;. I'm assuming (zing!) you came to that by trial and error. However, the underlying problem remains that there are already duplicates in your tables, causing multiple records to be returned.

On a side note, have a look at creating Views. That way you can reduce that looped query over multiple tables to just a single query on the view.

In order to ensure uniqueness over multiple tables, one of the two UUID functions is the way to go.

An increment could also ensure uniqueness over multiple tables if you store its value outside of those tables, in the most simple form it could be in its own table. That way you can start at 10.000 and keep adding 1. However, in an ideal world you would not let the front end have any influence on this. For instance, what if you're running your code above, the id comes back as unique so you do your thing but before you can actually insert the record someone else has just submitted their record, making your id a duplicate.

What you could do is look into Triggers. In your code you would leave sec_id blank, as you would with an auto_increment id, the triggers in the database will fire on insertion and put a unique id in the sec_id field.

By the way, is there a particular reason there are multiple tables with exactly the same columns? Considering your id's only go up to 10 million... if speed is an issue with so few records there might be a whole different problem as to why it can't all be in one table. Solving that instead and merging all the tables would automatically solve your unique id over multiple tables issue.

Edit: I read again and noticed you mentioned it being for simplicity and different departments. In my opinion it would be simpler to merge all the tables, adding a column for department. In order to keep info separate between departments you could create views for each department, and even lock other views if security is an issue.

Edited 8 Years Ago by Traevel because: hindsight

Traevel 216 Light Poster

8 Years Ago

but before you RAGE!!!!!

Why would I rage?

its still letting dups through

It will always happen until you address that issue in the database itself.

I'm not a MySQL expert, but in the reference manual I found this bit:

The default behavior for UNION is that duplicate rows are removed from the result.

Since you only select the sec_id I'm guessing there would never be more than one result, unless you use UNION ALL.

This query building is hiding the problem that you are having. You didn't post the actual INSERT statements, so it could also be going wrong there. This code will/can create thousands and because of the nested structure even millions of calls to your database as time goes on for every "unique" id it needs to generate (again, with multiple GUI's you will still get duplicates no matter what you do in the PHP code).

I would create a view of all the separate tables so that you have everything in a single view. Then ensure there are no duplicates there. At least then you can run a select query on that single view instead of all these unions, and maybe pinpoint the problem.

because my "so called" boss

Be aware that this is a public forum and even though your username might be untraceable, the code you posted is fairly specific and will lead back to here, and thus you. For instance, a a google on $random_number = rand(10000,10000000); already gives this page as the top result.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

berserk 60 Junior Poster · Answer 1 · 2017-02-22T05:22:07+00:00

thank for the speedy reply! :D

and yeah as for the code generating sec_id's, that code is in the do while loop i posted, and yes the test sec_id's are fake i just through some random numbers in there to show an example sorry bout that lol

My next question though is, isnt the random number being recreated when the do runs again if the while comes back true?

my thoughts on this were that the random number is being generated and then tested against the system to check if its already in there, then if it is just run it again and generate another random number and check again. Is this incorrect?

Traevel 216 Light Poster · Answer 2 · 2017-02-22T13:57:35+00:00

Well, the queries I see in the do-while loop are all SELECT queries. So that's not the bit of code that INSERTS the sec_id into the database.

From what I can tell this is what's happening:

// create a random number between 10.000 and 10.000.000
// select all the records where sec_id matches this random number
// if there is exactly one record that matches, repeat the process
//
// implied > if the random number does not exist as a sec_id OR there is more than one match the whole process stops

After seeing this I realize that this is the bit of code you use to generate a random number that will be used as a sec_id later on in your code. The first adjustment I would make would be on the database side, by adding a Unique Index on the sec_id field. If you try to enter a duplicate the insertion will fail.

A crude way of solving it would then be to keep generating random id's until the insertion succeeds.

Is there a particular reason why the numbers can't be sequential? Does it have to be a certain number of digits?

MySQL has a built in function for generating unique integer ID's in the form of UUID_SHORT().

You invoke UUID_SHORT() on average fewer than 16 million times per second between mysqld restarts

This bit about ensuring uniqueness made me chuckle.

berserk 60 Junior Poster · Answer 3 · 2017-02-22T14:29:39+00:00

i wanted to go with the unique id idea before but im not sure how to add that and still have it work across all tables. Can you have two unique id columns in one table? Also i really like the idea of find the largest value and then just incrementing from there, im not sure why it wasnt setup like that to begin with but again this is just a rewrite of what they had before.

as fo the number of digits i guess it just has to be within that random number range, they havent given me to much info on that so im doing a lot of assuming lol.

The UUID_SHORT() how exactly does that work? Does is garuntee that the chances of making an id that are the same is near impossible or does it look into what i have and generate one based on non existing values?

Traevel 216 Light Poster · Answer 4 · 2017-02-22T15:29:30+00:00

so im doing a lot of assuming

Be very careful with that, it will come back to bite you in the end.

From the page I linked you:

The value of UUID_SHORT() is guaranteed to be unique if the following conditions hold:

The server_id value of the current server is between 0 and 255 and is unique among your set of master and slave servers

You do not set back the system time for your server host between mysqld restarts

You invoke UUID_SHORT() on average fewer than 16 million times per second between mysqld restarts

What it does is the following:

(server_id & 255) << 5 
+ (server_startup_time_in_seconds << 24)
+ incremented_variable++;

I'm not a database expert, but I'd say this will generate a unique number regardless of tables. The resulting number will be longer than 8 digits though.

If I were you I would contact my employer/customer and ask them if they have any particular restrictions on that sec_id. Things like:

should it be readable by a human (if someone has to type it in you'll probably want something else)
should it be unpredictable (even UUID() is predictable)
is it limited to a certain amount and type of characters/digits
how many sec_id's can there be expected to be, also in the future (thousands? millions? billions?)

Then I would change the id's accordingly, based on the information you get.

No one will expect you to guess or assume. If they say it's up to you, so be it. Just don't assume something, create a unique 128-bit String id and then find out one day before release that a receptionist is going to have to type in 200 of those id's during his/her workday. Even if they say "it's up to you" be sure to inform them about these things anyway, they might not know it will be an issue they care about until you tell them.

berserk 60 Junior Poster · Answer 5 · 2017-02-22T16:10:23+00:00

thank you so much for all your time! :D this has been extremely informative and i must say i have learned quite a bit and now feel a little more confident in how to think through this. I have talked to the higher ups and they explained to me that they want it as similar as it can be, so changing the size of the numbers is no good, but what i feel i can do is change the logic behind my orginal code to accomidate the multiples, please tell me if im on the right track with this.

do {
        $random_number = rand(10000,10000000);
        $sql = "(SELECT sec_id FROM depart_table0 WHERE sec_id='".$random_number."')";
        for ($i=0; $i < $table_count[0]-7; $i++){
            $sql .= "UNION (SELECT sec_id FROM depart_table".$i." WHERE sec_id='".$random_number."')";
        }
        $sql_response = mysqli_query($con, $sql)or die(mysqli_error($con));
    } while (mysqli_num_rows($sql_response) >= 1);

i know its a small change but would the greater than or equal to 1 solve my problem? I have read over and i believe my dups are coming not from this but from the fact that i have potentially more than one match whereas i was only checking for a single match there might be 2 or even 3 and that would cause it to resolve early.

I love how you stepped through my logic above because i feel that was what helped open my eyes a little better to what i initally wanted in the beginning. :)

berserk 60 Junior Poster · Answer 6 · 2017-02-22T17:08:51+00:00

lol oh believe me i whole heartedly agree, i would have done things very different if i had started a project like this from the beginning but the biggest roadblock im running into is basically the people im attempting to rewrite this for already have a sql database in place with the above mentioned structure of have all the departments seperated into their own unique table which has been a royal pain to code around. I have mentioned compiling all the data into one table and working from there but the feel i should write my code around what they already have... its kinda one of those scenarios where some older guys are telling me "its good enough, dont go changing anything that doesnt need to be changed" so i get mixed results on updates with them but basically they want me to write a script to modify the duplicates and make them unique, then apply this fix i have worked out with you travel.

I thank you so very much for your time and you insight into better practices, I feel better prepared for when i tackle a project where i decide what goes instead of the people im doing it for.

HAPPY CODING! :D

Traevel 216 Light Poster · Answer 7 · 2017-02-22T18:58:36+00:00

then apply this fix i have worked out with you

It's not really a fix though. When multiple people start adding records you're still at risk of getting duplicates. You won't be able to prevent that anywhere else but in the database itself. Whether it's with locking, triggers, restrictions etc. If you think that making the number random will decrease the chance of another GUI getting the same random number at the same time remember that over time there will be fewer and fewer possibilities left for those random numbers (after all, you need one that doesn't exist yet).

"its good enough, dont go changing anything that doesnt need to be changed"

I don't know what your position is in this case, internship/freelance/contract etc., but I'd argue that this needs to be changed. From what you've told me, and the arguments they've given you this is not a valid reason for not fixing it where it's supposed to be fixed. In the database. With the dirty shortcut being that you can add an always unique UUID which will be unique unless those 16 million calls per second start happening. But that means a 64-bit integer as key.

If this were a leaky ship you've only made the hole a bit smaller. And I think that's something you should always be able to discuss with your employer/customer.

berserk 60 Junior Poster · Answer 8 · 2017-02-28T05:38:08+00:00

OK! So i have had what some might consider to be a rough week and this is where i am with this monstrosity thats been literally dumped in my lap.

I have modified the code using the fix above and still i am getting the duplicates but before you RAGE!!!!! let me give more context...

The reason i have the subtract tables is because the genious who designed the sql database seperated all the departments into their own table and then some how managed to generate some random ass number that didnt get duplicated once, then when i come around and rewrite it using what i thought was the same logic (going from visual basic to php) i end up getting these stupid duplicates...

i stripped the original code down cause i thougth maybe it was just the whole single department thing but turns out its generating dups in the co op departments so here is the code that i have so far that i believed to be working until i found out that its still letting dups through

if ($co_partner === "yes") {
    do {
        $random_number = rand(10000,10000000);
        $sql = "(SELECT sec_id FROM depart_table0 WHERE sec_id='".$random_number."')";
        for ($i=0; $i < $table_count[0]-7; $i++){
            $sql .= "UNION (SELECT sec_id FROM depart_table".$i." WHERE sec_id='".$random_number."')";
        }
        $sql_response = mysqli_query($con, $sql)or die(mysqli_error($con));

        $sql2 = "(SELECT sec_id_cop FROM depart_table0 WHERE sec_id_cop='".$random_number."')";
        for ($i=0; $i < $table_count[0]-7; $i++){
            $sql2 .= "UNION (SELECT sec_id_cop FROM depart_table".$i." WHERE sec_id_cop='".$random_number."')";
        }
        $sql_response2 = mysqli_query($con, $sql2)or die(mysqli_error($con));

    } while (mysqli_num_rows($sql_response) >= 1 || mysqli_num_rows($sql_response2) >= 1);

    do {
        $random_number_cop = rand(10000,10000000);
        $sql = "(SELECT sec_id FROM depart_table0 WHERE sec_id='".$random_number_cop."')";
        for ($i=0; $i < $table_count[0]-7; $i++){
            $sql .= "UNION (SELECT sec_id FROM depart_table".$i." WHERE sec_id='".$random_number_cop."')";
        }
        $sql_response_cop = mysqli_query($con, $sql)or die(mysqli_error($con));

        $sql2 = "(SELECT sec_id_cop FROM depart_table0 WHERE sec_id_cop='".$random_number_cop."')";
        for ($i=0; $i < $table_count[0]-7; $i++){
            $sql2 .= "UNION (SELECT sec_id_cop FROM depart_table".$i." WHERE sec_id_cop='".$random_number_cop."')";
        }
        $sql_response_cos2 = mysqli_query($con, $sql2)or die(mysqli_error($con));

    } while (mysqli_num_rows($sql_response_cop) >= 1 || mysqli_num_rows($sql_response_cos2) >= 1);
}
else {
    do {
        $random_number = rand(10000,10000000);
        $sql = "(SELECT sec_id FROM depart_table0 WHERE sec_id='".$random_number."')";
        for ($i=0; $i < $table_count[0]-7; $i++){
            $sql .= "UNION (SELECT sec_id FROM depart_table".$i." WHERE sec_id='".$random_number."')";
        }
        $sql_response = mysqli_query($con, $sql)or die(mysqli_error($con));
        $random_number_cop = 0;
    } while (mysqli_num_rows($sql_response) >= 1);
}

AND BEFORE you point out the table count - 7 THATS because there are literally on 32 departments and all the tables are named specifically depart_table0, depart_table1, depart_table2, depart_table3... and so on until 32, the data i am looking for is ONLY in those tables, the other tables (the ones being subtracted) have nothing to do with the depart tables, and forgive any rudeness i might sound like i have because my "so called" boss explained to me in the politest of ways that if i cannot fix this without rewriting it then they will just have to find someone who can which i feel was a direct attack on me so not feeling to hot right now...

thank you for the huge help to anyone willing to listen to my rant and help explain my errors.