I need to implement a huge array of 2100000 integer elements. I have to find duplicate elements in this array which means there will be lot of search. Can you please suggest me an efficient way to implement this? What collection to use and how to search.

Recommended Answers

All 3 Replies

You may have a real problem here. I don't think you will find anything faster than a simple array of ints. I tried a quick test, and found execution time ~ 1 sec at 1/100 of your data size, ~1 min at 1/10. I didn't try the full size array, but I'd guess ~1 hour.
You could presumably halve this by multi-theading it and using >1 core, but that's not going to give you an order of magnitude improvement.

final int SIZE = 2100  /* 000 */ ;
int[] big = new int[SIZE];

long count = 0;
for (int i = 0 ; i < SIZE; i++) {
	int temp = big[i];
	for (int j=i+1 ; j< SIZE ; j++) {
		if (temp == big[j]) count++;
	}
}
System.out.println(count);

You may have a real problem here. I don't think you will find anything faster than a simple array of ints. I tried a quick test, and found execution time ~ 1 sec at 1/100 of your data size, ~1 min at 1/10. I didn't try the full size array, but I'd guess ~1 hour.
You could presumably halve this by multi-theading it and using >1 core, but that's not going to give you an order of magnitude improvement.

final int SIZE = 2100  /* 000 */ ;
int[] big = new int[SIZE];

long count = 0;
for (int i = 0 ; i < SIZE; i++) {
	int temp = big[i];
	for (int j=i+1 ; j< SIZE ; j++) {
		if (temp == big[j]) count++;
	}
}
System.out.println(count);

Thank you very much for the suggestion.

What do you know about the integers in the data? If they are uniformly distributed within a known range, I think I can suggest a way to get a very significantly faster result...

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.