Help removing duplicates from an array

Question

nizbit 0 Junior Poster in Training

14 Years Ago

I'm trying to find duplicates in an array and removing them by shifting the array index using the same array. I can't figure out where in my code the problem is but I'm still getting a duplicate. Could someone please help pinpoint the problem? I've tried debugging it but I'm not seeing it.

#include <iostream>
using namespace std;

int removeDuplicates(int A[], int n){
	int last = n-1;
	int found = 0;
	//bool flag = false;

	for(int i=0; i < last+1; i++)
	{
		for(int j=i+1; j < last+1; j++)
		{
			if(A[i] == A[j])
			{
				found++;
				for(int k = j;k < last+1;k++)
				{
					A[k]=A[k+1];
					//if(A[k-1] == A[i])
					//	flag = true;
				}
				last--;
				j=j+1;
			}

		}
}
	//found= last-found;
	return last;
}

int main() {

	// Test data
	int n = 24;
	int A[] = {1,1,2,4, 1,1,1,2, 3,5,8,0, 2,-2,-4,1, 9,10,-4,11, 11,8,5,-4};

	// Call
	int last = removeDuplicates(A,n);

	// Print Results
	for(int i = 0; i <= last; i++)
		cout << A[i] << " ";
	cout << endl;

	return 0;
}

c++

5 Contributors
11 Replies
538 Views
1 Week Discussion Span
Latest Post 14 Years Ago Latest Post by dusktreader

All 11 Replies

dusktreader 137 Posting Whiz in Training

14 Years Ago

I'm trying to find duplicates in an array and removing them by shifting the array index using the same array. I can't figure out where in my code the problem is but I'm still getting a duplicate. Could someone please help pinpoint the problem? I've tried debugging it but I'm not seeing it.

#include <iostream>
using namespace std;

int removeDuplicates(int A[], int n){
	int last = n-1;
	int found = 0;
	//bool flag = false;

	for(int i=0; i < last+1; i++)
	{
		for(int j=i+1; j < last+1; j++)
		{
			if(A[i] == A[j])
			{
				found++;
				for(int k = j;k < last+1;k++)
				{
					A[k]=A[k+1];
					//if(A[k-1] == A[i])
					//	flag = true;
				}
				last--;
				j=j+1;
			}

		}
}
	//found= last-found;
	return last;
}

int main() {

	// Test data
	int n = 24;
	int A[] = {1,1,2,4, 1,1,1,2, 3,5,8,0, 2,-2,-4,1, 9,10,-4,11, 11,8,5,-4};

	// Call
	int last = removeDuplicates(A,n);

	// Print Results
	for(int i = 0; i <= last; i++)
		cout << A[i] << " ";
	cout << endl;

	return 0;
}

I see one fundamental problem here. You are shifting your j index even when a duplicate is found. When you find a duplicate and shift all items, you need to leave the j index in place:

1.  Found duplicate

    i     j
    |     |
    v     v
ABCDEFGHIJEEKLMNO

2.  Shift

    i     j
    |     |
    v     v
ABCDEFGHIJEEKLMNO
           vvvvvv
           ||||||
           //////
          |||||| 
          vvvvvv 
ABCDEFGHIJEKLMNO

3.  Incorrect increment of j

    i      j
    |      |
    v      v
ABCDEFGHIJEKLMNO
          ^
          |_____________There is still a duplicate, but it was skipped!

So, when there is a duplicate found, j should not be incremented.

I would like to make a point here. If the ordering of the array is not important, there is a much faster way to do this. You can simply swap the with tail index.

1. Find a duplicate

    i     j                  last
    |     |                   |
    v     v                   v
ABCDEFGHIJEKLMNP.............OP

2.  Swap duplicate with last item.  
    In this case, we don't really need to swap, we can just copy

    i     j                  last
    |     |                   |
    v     v                   v
ABCDEFGHIJEKLMNP.............OP
                              v
           ___________________|
          |
          v 
ABCDEFGHIJPKLMNP.............OP

3.  Decrement the last index

    i     j                 last
    |     |                  |
    v     v                  v
ABCDEFGHIJPKLMNP.............OP

4.  Do not increment j if a duplicate was found and swapped!
    Imagine the same list, only the last index was another duplicate.

    i     j                  last
    |     |                   |
    v     v                   v
ABCDEFGHIJEKLMNP.............OE
                              v
           ___________________|
          |                    
          v
ABCDEFGHIJEKLMNP.............OE
          ^
          |____________If you increment j, you will miss this duplicate

Of course, you can improve step 4 by decrementing the last pointer until it is not pointing at a duplicate:

i     j                        last
    |     |                          |
    v     v                          v
ABCDEFGHIJEKLMNP.............OEEEEEEEE



    i     j                 last
    |     |                  |
    v     v                  v
ABCDEFGHIJEKLMNP.............OEEEEEEEE
If you increment j, you will miss this duplicate

    i     j                 last
    |     |                  |
    v     v                  v
ABCDEFGHIJEKLMNP.............OEEEEEEEE
                             v
           __________________|
          |                    
          v
ABCDEFGHIJOKLMNP.............OEEEEEEEE
          ^
          |____________Ok to increment j now.

Hope this helps!!!

mrnutty commented: Nice Ascii Art and good explanation. +5

Lerner 582 Nearly a Posting Maven

14 Years Ago

Note how the pseudocode written by dusktreader doesn't really "remove" duplicates from the array. Instead it manipulates the duplicates such that the "duplicates" can be ignored, or removed from consideration, eventhough they aren't physically removed. Indeed there is no way to remove the memory from an array once it has been declared short of removing all the memory and starting over. The ability to "ignore" things that aren't currently useful is a very important skill to learn.

mrnutty 761 Senior Poster

14 Years Ago

Its probably because the professor wants the student to learn, instead of blindly using
a function.

Fbody commented: Absolutely! :D +2

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

mrnutty 761 Senior Poster · Answer 1 · 2010-08-31T08:41:06+00:00

Perhaps, one of the faster way to do this is like so :

template<typename ForwardIterator>
void makeUnique(ForwardIterator begin, ForwardIterator end){
 std::sort(begin,end);
 std::unique(begin,end);
}

dusktreader 137 Posting Whiz in Training · Answer 2 · 2010-08-31T10:53:35+00:00

Perhaps, one of the faster way to do this is like so :

template<typename ForwardIterator>
void makeUnique(ForwardIterator begin, ForwardIterator end){
 std::sort(begin,end);
 std::unique(begin,end);
}

Of course that would be simpler, but this smacks of a homework assignment. If that is the case, the important part is to learn the underlying logic of the algorithm. Thinking along those lines led me to discuss the explicit method.

I am a complete adherent of the STL, however, and, dear OP, if you haven't learned about the c++ standard template library, you should investigate it soon.

nizbit 0 Junior Poster in Training · Answer 3 · 2010-09-06T23:37:00+00:00

I wanted to say thanks for all the replies. Decrementing j instead, got my code to work. I like dusktreader's alternative. Originally I wanted to set the dupes to null but I did not want to overwrite any values.

Yes this was for homework. I desperately need to learn the STL. My school for a lot of reasons frowns on the STL. In fact they barely teach it. I understand some of their concerns, but at the same time the STL doesn't have cooties :D.

dusktreader 137 Posting Whiz in Training · Answer 4 · 2010-09-07T23:26:34+00:00

My school for a lot of reasons frowns on the STL. In fact they barely teach it. I understand some of their concerns, but at the same time the STL doesn't have cooties :D.

This is, unfortunately, quite common. I really don't understand this mentality. The STL is a standard library, so it ships with all c++ compilers. It is a stable library, and, if you use it as it was intended (i.e. don't use it with threads), it is a very useful and powerful tool.

Sadly, a lot of professors of Computer Science still cling desperately to a lot of conventions they studied as students of the C programming language. In those days, you spent a lot of time and effort managing memory and array sizes.

There are many times when the STL is not appropriate for the task. However, for most general purpose container needs, the STL is great. I'm a feisty person, so I would question my professor as to why, specifically, he doesn't teach the STL.

Here are a few neat points about STL containers:

1. STL vectors have the same time complexity as arrays for random access.
2. STL lists have the same time complexity for insertion/deletion as a self-implemented linked list.
3. STL is a standard with historic stability.
4. STL containers use dynamic memory allocation and de-allocation internally.
5. STL containers are easy to use and richly featured.
6. STL containers use a common interface, so most containers may be traversed using common syntax (i.e. they are interchangeable)
7. STL containers are templates, so you can use them to store any data type.
8. STL containers are classes, so you can use inheritance to specialize them for your own needs.

Fbody 682 Posting Maven Featured Poster · Answer 5 · 2010-09-08T02:46:16+00:00

Its probably because the professor wants the student to learn, instead of blindly using
a function.

Agreed, it is far better to learn proper techniques and algorithms than to learn to call some anonymous function someone else wrote.

dusktreader 137 Posting Whiz in Training · Answer 6 · 2010-09-08T04:39:12+00:00

@firstPerson & Fbody

I couldn't agree more. However, I have known many professors ( teachers and colleagues ) that recommend against using the STL even outside/after the classroom. I think I has a lot to do with the stubbornness of C programmers to accept a new paradigm.

Usually, Computer Science programs progress something like:

Computer Science I - Introduction to Computer Science (required of engineers also)
Computer Science II - Language specific tools and basic algorithms
Data Structures - Study of data composition, iterative processing, and intermediate algorithms

I think it is quite unfortunate that in CS I professors put introductory students through the pain of managing arrays. I work with engineers and scientists that have their entire programming knowledge based upon one college class. Every time I see them declare a static array of structs 1000 items deep, I rue the day their professor told them to use this in CS I:

myStruct myArray[1000];
int    myArrayCounter = 0;

mrnutty 761 Senior Poster · Answer 7 · 2010-09-08T05:59:29+00:00

In my college, instead of using C++ as an introductory language, the college changed it to matlab because its better suited for engineers, and easier, and more useful for engineers that are not in CSE branch.

dusktreader 137 Posting Whiz in Training · Answer 8 · 2010-09-08T06:08:31+00:00

@firstPerson
That is a great idea for engineering students. I don't know if I'm convinced that it's appropriate for CS students ( because I friggen' hate matlab ).

I really think a dynamic programming language should be taught in CS I. This way, the basics of programming can be clearly taught without all the clutter of low-level maintenance to sour students to the art. Being a fan, I think Python would be a great language for anyone to cut their programming teeth on.

Help removing duplicates from an array

Recommended Answers Collapse Answers

All 11 Replies

Recommended Answers