i was trying to solve this problem.

Given an array of size N in which every number is between 1 and N, determine if there are any duplicates in it.

Well, what i did is, use an array called present, to flag the elements ie:

for (i = 0; i < N; i++) {
    if (!present[i]) {
       present[i] = 1;
    } else {
       /* duplicate */

Is there a better way of doing it?

7 Years
Discussion Span
Last Post by myk45

That's a great way to do it, (called "Counting Sort", sometimes), IF the range of the numbers, is not too large.

Run time is O(n), and you can't beat that, as long as the range is suitable.


Thanks for the reply Adak. But what if the number is huge? What way can i solve that?


It depends on the data. If there were a lot of values < say 1000, and lots of values > 1 Million, with nothing in between, then it might make sense to have two arrays for Counting sort to work through. One for low and one for higher values.

Another way to do it - very common, is to simply sort the values. Now all the duplicate values will be right next to each other, and easily removed.

There are a wide variety of tricks with other data structures that can do the same thing. I haven't used them, but hash tables are one such. If the number hashes out to the same value, it will get tested for equality.

The general purpose being to get rid of any duplicate values, as far "upstream" in your processing of the data, as possible.

I'm always reminded of an old R:Base Relational Database, that SO EASILY would duplicate every record in the table you were working in. That was one of the first BASIC programs I wrote for work - remove duplicate records from our table in R:Base. (quite an adventure on a 286 cpu) ;)



Thanks. The second method of sorting which you suggested was good too.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.