Finding duplicates in an Array

Question

myk45 48 Posting Whiz

14 Years Ago

Hello.

i was trying to solve this problem.

Given an array of size N in which every number is between 1 and N, determine if there are any duplicates in it.

Well, what i did is, use an array called present, to flag the elements ie:

for (i = 0; i < N; i++) {
    if (!present[i]) {
       present[i] = 1;
    } else {
       /* duplicate */
}

Is there a better way of doing it?

c

2 Contributors
4 Replies
101 Views
1 Day Discussion Span
Latest Post 14 Years Ago Latest Post by myk45

Adak 419 Nearly a Posting Virtuoso

14 Years Ago

That's a great way to do it, (called "Counting Sort", sometimes), IF the range of the numbers, is not too large.

Run time is O(n), and you can't beat that, as long as the range is suitable.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

myk45 48 Posting Whiz · Answer 1 · 2010-10-17T04:06:42+00:00

Thanks for the reply Adak. But what if the number is huge? What way can i solve that?

Adak 419 Nearly a Posting Virtuoso · Answer 2 · 2010-10-18T00:49:19+00:00

It depends on the data. If there were a lot of values < say 1000, and lots of values > 1 Million, with nothing in between, then it might make sense to have two arrays for Counting sort to work through. One for low and one for higher values.

Another way to do it - very common, is to simply sort the values. Now all the duplicate values will be right next to each other, and easily removed.

There are a wide variety of tricks with other data structures that can do the same thing. I haven't used them, but hash tables are one such. If the number hashes out to the same value, it will get tested for equality.

The general purpose being to get rid of any duplicate values, as far "upstream" in your processing of the data, as possible.

I'm always reminded of an old R:Base Relational Database, that SO EASILY would duplicate every record in the table you were working in. That was one of the first BASIC programs I wrote for work - remove duplicate records from our table in R:Base. (quite an adventure on a 286 cpu) ;)

myk45 48 Posting Whiz · Answer 3 · 2010-10-18T01:35:16+00:00

@Adak

Thanks. The second method of sorting which you suggested was good too.