User Name Password Register
DaniWeb IT Discussion Community
All
What is DaniWeb IT Discussion Community?
You're currently browsing the Pascal and Delphi section within the Software Development category of DaniWeb, a massive community of 456,234 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 3,754 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Pascal and Delphi advertiser: Programming Forums
Views: 2632 | Replies: 14 | Solved
Reply
Join Date: Nov 2007
Posts: 87
Reputation: squidd is an unknown quantity at this point 
Rep Power: 2
Solved Threads: 1
squidd squidd is offline Offline
Junior Poster in Training

Prblem with removing duplicates entries in a list.

  #1  
Nov 14th, 2007
I have it coded to where a small list can have dupes removed rather efficiently. But, the larger the list gets, the longer it takes to the point that it is better to not even try. A list of 15,000 lines would takes hours and hours... probably longer to remove a lot of duplicates.

Here is what I have done:

  1. procedure TMainForm.RemoveDuplicatesButtonClick(Sender: TObject);
  2. var
  3. i, j: integer;
  4. label
  5. Recheck;
  6. begin
  7. Recheck:
  8. for i := 0 to ListView1.Items.Count - 1 do
  9. for j := 0 to ListView1.Items.Count - 1 do
  10. begin
  11. if i = j then continue;
  12. if ListView1.Items[i].Caption = ListView1.Items[j].Caption then
  13. begin
  14. ListView1.Items.Delete(j);
  15. goto Recheck;
  16. end;
  17. ListView1.Columns[0].Caption := inttostr(ListView1.items.Count);
  18. end;
  19. CurrentStatusLabel.caption := 'Removed Duplicates';
  20. end;

While this does work, like I said, it gets too slow for large lists. Is there a way to somehow load the list into memory to do this procedure? I know an array would be much faster, nut I'm not that far yet. This is in Tlistview of course Duoas as you well know by now.

Also, When I try to give a correct count of duplicates removed. It is not correct so I changed the label caption to just "Removed Duplicates".

Thanks as always for any help.
AddThis Social Bookmark Button
Reply With Quote  
Join Date: Oct 2007
Location: Cherry Hill, NJ
Posts: 1,878
Reputation: Duoas is a splendid one to behold Duoas is a splendid one to behold Duoas is a splendid one to behold Duoas is a splendid one to behold Duoas is a splendid one to behold Duoas is a splendid one to behold Duoas is a splendid one to behold 
Rep Power: 13
Solved Threads: 193
Featured Poster
Duoas's Avatar
Duoas Duoas is offline Offline
Posting Virtuoso

Re: Prblem with removing duplicates entries in a list.

  #2  
Nov 14th, 2007
The TListView is going to be really slow no matter what. However, there are a couple of things you can do to help.

In your loop on line 9, you are starting at zero again. You don't need to check things already known not to be duplicates. Say instead:
for j := i+1 to ListView1.Items.Count-1 do

In the same vein, when you goto recheck you are starting all over. Since the items are ordered, you might want to work backwards:
for j := ListView1.Items.Count-1 downto i+1 do
When a duplicate is found and removed, you just continue on your merry way and ignore all the stuff you have already checked (and possibly removed).

You might want to consider using the ListView1.FindCaption method instead of the doing the inner loop yourself. That way you can just loop until FindCaption returns nil. I think that would probably add a time savings since the class can usually access its members faster than you can using the classes property accessors.

You can count how many you removed by incrementing a counter whenever you actually remove a duplicate and at no time else. Set the counter to zero before you do anything else. I'm sure you had something very close to this when you tried before.

Good luck.
Last edited by Duoas : Nov 14th, 2007 at 9:53 pm.
Reply With Quote  
Join Date: Nov 2007
Posts: 87
Reputation: squidd is an unknown quantity at this point 
Rep Power: 2
Solved Threads: 1
squidd squidd is offline Offline
Junior Poster in Training

Re: Prblem with removing duplicates entries in a list.

  #3  
Nov 14th, 2007
I added this to see what was happening when I when backwards in the large list.

  1. CurrentStatusLabel.caption := 'Removed' + inttostr(j) + 'Duplicates';
  2. Application.ProcessMessages;

It counts backwards and goes all the way to down to zero still. Should I have done more than just change that line? I looked at the syntax for ListView1.FindCaption and I couldn't get that to compile. I will read up on that.

This is the now changed code:

  1. procedure TMainForm.RemoveDuplicatesButtonClick(Sender: TObject);
  2. var
  3. i, j: integer;
  4. label
  5. Recheck;
  6. begin
  7. Recheck:
  8. for i := 0 to ListView1.Items.Count - 1 do
  9. for j := ListView1.Items.Count-1 downto i+1 do
  10. begin
  11. if i = j then continue;
  12. if ListView1.Items[i].Caption = ListView1.Items[j].Caption then
  13. begin
  14. ListView1.Items.Delete(j);
  15. goto Recheck;
  16. end;
  17. ListView1.Columns[0].Caption := inttostr(ListView1.items.Count);
  18. CurrentStatusLabel.caption := 'Removed' + inttostr(j) + 'Duplicates';
  19. Application.ProcessMessages;
  20. end;
  21. //CurrentStatusLabel.caption := 'Removed Duplicates'; // this doesn't give the correct number
  22. end;

and my count is still off... I know it is because I havent done the count properly. I think...

Thanks for your help as always.
Last edited by squidd : Nov 14th, 2007 at 11:31 pm.
Reply With Quote  
Join Date: Oct 2007
Location: Cherry Hill, NJ
Posts: 1,878
Reputation: Duoas is a splendid one to behold Duoas is a splendid one to behold Duoas is a splendid one to behold Duoas is a splendid one to behold Duoas is a splendid one to behold Duoas is a splendid one to behold Duoas is a splendid one to behold 
Rep Power: 13
Solved Threads: 193
Featured Poster
Duoas's Avatar
Duoas Duoas is offline Offline
Posting Virtuoso

Re: Prblem with removing duplicates entries in a list.

  #4  
Nov 15th, 2007
Sorry, I should have made myself more clear.

Try this:
  1. procedure TMainForm.RemoveDuplicatesButtonClick(Sender: TObject);
  2. var
  3. i, j, num_removed: integer;
  4. begin
  5. num_removed := 0;
  6. for i := 0 to ListView1.Items.Count - 1 do
  7. for j := ListView1.Items.Count-1 downto i+1 do
  8. begin
  9. if i = j then continue; // <-- this line shouldn't be necessary
  10. if ListView1.Items[i].Caption = ListView1.Items[j].Caption then
  11. begin
  12. ListView1.Items.Delete(j);
  13. inc( num_removed )
  14. end;
  15. Application.ProcessMessages;
  16. end;
  17. ListView1.Columns[0].Caption := inttostr(ListView1.items.Count);
  18. CurrentStatusLabel.caption := 'Removed ' + inttostr(num_removed) + ' Duplicates';
  19. end;
I didn't actually test this code --it's late and I'm going to bed now... Also, with the revised j loop I don't think you need to test for i = j, it should never happen. (Alas, my brain is shot right now...)

Good luck.
Reply With Quote  
Join Date: Sep 2007
Posts: 69
Reputation: ExplainThat is an unknown quantity at this point 
Rep Power: 2
Solved Threads: 7
ExplainThat ExplainThat is offline Offline
Junior Poster in Training

Re: Prblem with removing duplicates entries in a list.

  #5  
Nov 15th, 2007
Hello Sqidd,

May I suggest that you get hold of Danny Thorpe's book on Delphi programming? The last time I checked it was out of print but available second hand on Amazon at over $100. A lot of money but money very well spent.

There are a number of things wrong with your code.
  • For starters you actually use a GOTO statement. In over a decade as a Delphi programmer I have never used a GOTO. Quite apart from structural issues, GOTOs generate very poor quality assembler code. NEVER use them.
  • Secondly, what you are doing is little short of a selection sort. With 15000 items it will be relatively slow.
  • You wouldn't normally notice the lack of speed. The problem is that with what you are doing you are deleting listview items one at a time. Each such deletion triggers multiple messages that culminate with a control redraw. Messages are slow. Redraws are slooow and get sloooooooower as the number of items in your listview increase.

As a general obserrvation when you find that you have issues with speed it is better to take a step back and examine your assumptions rather than spend time trying to make it all go faster. In the present instance the two questions that spring to mind right away are
  1. Why not prevent the listview from being populated with duplicates to begin with by checking before adding new items?
  2. What on earth is the point of a listview that contains 1000s of items? Surely no one is ever going to look at all those items!

That said, here is what I suggest.
  • Don't attempt to cleanup the listview directly - and for future reference always try to avoid cleaning up the content of visual controls by manipulating them directly. It is far better to do this offscreen and then update the contents of the control
  • First put all your listview items into a sorted stringlist that is setup to ignore duplicates.
  • Then clear the listview and populate it back from the stringlist.

There is a little more to doing this. The core code is shown below

procedure TMaster.DoCleanUp(Sender:TObject);
var i,j,k,ACount:Integer;
    AStrings:TStringList;
    AList:TListItems;
begin
  AList:=lvOne.Items;
  {get a pointer to the listview items. This is faster than attempting
   to dereference using lvOne.Items each time}
  ACount:=AList.Count;//how many items in the list
  AStrings:=TStringList.Create;//create a temporary stringlist
  with AStrings do
  try
    Sorted:=True;
    Duplicates:=dupIgnore;
    //make it a sorted stringlist and ignore duplicates
    for i:=0 to ACount - 1 do AddObject(AList[i].Caption,TObject(i));
    {Add every item in the listview to the stringlist. dupIgnore +
     Sorted ensures that duplicate entries are ignored. We want to
     add entries back to the listview in their original order which
     will not be preserved in the stringlist since it is sorted. So
     we store the original indices in the Objects array of the
     stringlist. Note the TObject(i) typecast.}

    AList.Clear;
    {We have a safe copy of all the listview entries we want to retain so
     why keep the listview entries? Just clear the listview. This issues
     just one LVM_DELETEALLITEMS message as opposed to multiple
     LVM_DELETEITEM messages}
    j:=0;
    {To repopulate the listview we want to start finding its original
     entries starting from the first one, at index 0}
    for i:=0 to ACount - 1 do
    begin
      {The stringlist contains fewer entries than in the original listview
       but nevertheless we have to run this loop as many times as there
       were members in the listview since we need to identify them by
       their original position with the help of the stringlist.Objects
       array}
      k:=IndexOfObject(TObject(j));
      //where is the j'th entry in the listview?
      inc(j);//know that now so next time we want the (j + 1)th entry
      if (k >= 0) then AList.Add.Caption:=Strings[k];
      {The j'th entry could have been a duplicate & may no longer
      exist. Only if it does - add it back to the listview}
    end;
  finally Free end;//don't forget to destroy the stringlist!
  ShowMessage(IntToStr(AList.Count));
end;

The test project I created for this is in the ZIP attachment to this message. In my tests, on a not especially fast computer, the cleanup of a 10000 item listview took just a few seconds. To me this still isn't the ideal solution. I would try to recast the problem so the cleanup isn't necessary in the first place.

Hope this has been useful
Attached Files
File Type: zip lvtest.zip (250.2 KB, 8 views)
Reply With Quote  
Join Date: Nov 2007
Posts: 87
Reputation: squidd is an unknown quantity at this point 
Rep Power: 2
Solved Threads: 1
squidd squidd is offline Offline
Junior Poster in Training

Re: Prblem with removing duplicates entries in a list.

  #6  
Nov 15th, 2007
Yes, that is a much faster way of removing duplicates in an already slow listview procedure. As you mentioned, Line 9 was completely unnecessary and has been since removed.

Thank you for your help.
Reply With Quote  
Join Date: Nov 2007
Posts: 87
Reputation: squidd is an unknown quantity at this point 
Rep Power: 2
Solved Threads: 1
squidd squidd is offline Offline
Junior Poster in Training

Re: Prblem with removing duplicates entries in a list.

  #7  
Nov 15th, 2007
Hello ExplainThat,

I will also take a look at what you have written as well. I am VERY new to programming. So with that said. I may not be able to implement this procedure. But I will try. Thank you for you suggestion.
Last edited by squidd : Nov 15th, 2007 at 5:24 pm.
Reply With Quote  
Join Date: Nov 2007
Posts: 87
Reputation: squidd is an unknown quantity at this point 
Rep Power: 2
Solved Threads: 1
squidd squidd is offline Offline
Junior Poster in Training

Re: Prblem with removing duplicates entries in a list.

  #8  
Nov 15th, 2007
I just now saw the zip file enclosed in your post ExplainThat, I will take a look at it as I was unable to implement your suggestion. Thank you for your help. I am also trying to disable the RemoveDuplicatesButton if more than 1,000 lines are loaded into ListView. The only thing it does is disable at this point but still performs the procedure. lol

I will get this one I think. Cant be too hard. Or can it?

Thanks again.

EDIT -

here is something weird that I cant seem to get right... I have made it to where if Listview has more than 1,000 lines, the button that performs the procedure is still on... once clicked, it greys out and the removal of duplicates does not occur.. I am trying to figure out why it isnt greyed out after 1,000 lines have loaded. I thought I had it right.

It does do what It is supposed to do by not allowing the procedure to continue, but the button has to be clicked for it to grey out. This is annoying and would possibly confuse the user. Why isnt this right?

  1. procedure TMainForm.RemoveDuplicatesButtonClick(Sender: TObject);
  2. var
  3. i, j, num_removed: integer;
  4. begin
  5. if ListView1.items.Count > 1000 then //{after 1,000 lines I thought that the next line would gray out the button.}
  6. RemoveDuplicatesButton.Enabled := False //{this line prevents the procedure from continuing but wont gray out until the button is clicked.}
  7. else
  8. begin
  9. num_removed := 0;
  10. begin
  11. for i := 0 to ListView1.Items.Count - 1 do
  12. for j := ListView1.Items.Count-1 downto i+1 do
  13. begin
  14. if ListView1.Items[i].Caption = ListView1.Items[j].Caption then
  15. begin
  16. ListView1.Items.Delete(j);
  17. inc( num_removed )
  18. end;
  19. Application.ProcessMessages;
  20. end;
  21. ListView1.Columns[0].Caption := inttostr(ListView1.items.Count);
  22. CurrentStatusLabel.caption := 'Removed ' + inttostr(num_removed) + ' Duplicates';
  23. end;
  24. end;
  25. end;

i am sure is is something incredibly stupid on my part as usual...
Last edited by squidd : Nov 15th, 2007 at 8:45 pm.
Reply With Quote  
Join Date: Nov 2007
Posts: 87
Reputation: squidd is an unknown quantity at this point 
Rep Power: 2
Solved Threads: 1
squidd squidd is offline Offline
Junior Poster in Training

Re: Prblem with removing duplicates entries in a list.

  #9  
Nov 15th, 2007
Another thing that i havent been able to find anywhere is how to add an icon other than the default icon after the code has been built. I can add an icon in the form header, but not in the built *.exe...

Any ideas on where to look to read up on that? Im sure it is rather simple... So simple actually that noone has written about it, or that I have seen.
Reply With Quote  
Join Date: Oct 2007
Location: Cherry Hill, NJ
Posts: 1,878
Reputation: Duoas is a splendid one to behold Duoas is a splendid one to behold Duoas is a splendid one to behold Duoas is a splendid one to behold Duoas is a splendid one to behold Duoas is a splendid one to behold Duoas is a splendid one to behold 
Rep Power: 13
Solved Threads: 193
Featured Poster
Duoas's Avatar
Duoas Duoas is offline Offline
Posting Virtuoso

Re: Prblem with removing duplicates entries in a list.

  #10  
Nov 16th, 2007
The ButtonClick event procedures only get called if the user clicks the button. So nothing will grey if the 'set disabled' code is in the button click procedure. The place to grey the button is in the load procedure: after loading the file, check to see how many items there are. If more than 10,000, then disable the appropriate buttons...

In Windows applications, things only occur when the user clicks something, or types something, or etc. The only exceptions are system events (which you don't need to worry about) and timer events (if you drop a TTimer on your form).

To set the application icon, go to project options --> Application. There should be a spot to set the program icon and the program title.
Reply With Quote  
Reply

Only community members can participate in forum threads. You must register or log in to contribute.

DaniWeb Pascal and Delphi Marketplace
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 

Thread Tools Display Modes

Similar Threads
Other Threads in the Pascal and Delphi Forum

All times are GMT -4. The time now is 5:09 pm.
Forum system based on vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC