find duplicates from a spreadsheet file

Please support our Computer Science advertiser: Learn about neural networks and artificial intelligence.
Reply

Join Date: Jul 2005
Posts: 5
Reputation: datalinx1 is an unknown quantity at this point 
Solved Threads: 0
datalinx1 datalinx1 is offline Offline
Newbie Poster

find duplicates from a spreadsheet file

 
0
  #1
Jul 9th, 2005
HI, I am in need of a program that you can take a spreadsheet file and find all the duplicates with out haveing to go thru each entry. Is there such a thing out there or any way to make up something like that?
Reply With Quote Quick reply to this message  
Join Date: Jun 2004
Posts: 609
Reputation: freesoft_2000 is an unknown quantity at this point 
Solved Threads: 8
freesoft_2000 freesoft_2000 is offline Offline
Practically a Master Poster

Re: find duplicates from a spreadsheet file

 
0
  #2
Jul 20th, 2005
Hi everyone,

What kind of spread sheet file. Microsoft Excel or something you built yourself custom. What language did you use, C/C++, Java, etc??

Richard West
Microsoft uses "One World, One Web, One Program" as a slogan.
Doesn’t that sound like "Ein Volk, Ein Reich, Ein Führer" to you, too?
— Eric S. Raymond

Tell me what type of software do you like and what would you pay for it

http://www.daniweb.com/techtalkforums/thread19660.html
Reply With Quote Quick reply to this message  
Join Date: Jul 2005
Posts: 5
Reputation: datalinx1 is an unknown quantity at this point 
Solved Threads: 0
datalinx1 datalinx1 is offline Offline
Newbie Poster

Re: find duplicates from a spreadsheet file

 
0
  #3
Jul 20th, 2005
Originally Posted by freesoft_2000
Hi everyone,

What kind of spread sheet file. Microsoft Excel or something you built yourself custom. What language did you use, C/C++, Java, etc??

Richard West
Hello,
I use Microsoft Excel...I know how to check it individually but is there a way or program that will go through and check a whole list or file for duplicates. I am not sure what kind of language is used? sorry. Thanks for your reply.

Kind regards,

datalinx1
Reply With Quote Quick reply to this message  
Join Date: Jun 2004
Posts: 609
Reputation: freesoft_2000 is an unknown quantity at this point 
Solved Threads: 8
freesoft_2000 freesoft_2000 is offline Offline
Practically a Master Poster

Re: find duplicates from a spreadsheet file

 
0
  #4
Jul 22nd, 2005
Hi everyone,

Originally Posted by datalinx1
I know how to check it individually
You can actually do that by yourself. Do you use a program written by yourself??

I am actually very interested to know what is the stream format used for a Microsoft Excel file.

Anyways you can use C/C++, Java, Pascal for you task. If you can already check individual values is all you need to do is to create something like the Java ArrayList class. Before you add a value to that array check it with the other values of the array, if there is a duplicate, discard it if not add it to the array. That's it

Please have a format in your array so you know that this is a new row or colomn such as adding the string "column" or "row" so that you do not confuse yourself.

I hope that helps you

Yours Sincerely

Richard West

ps. Please answer the thread at the bootom of my signature
Microsoft uses "One World, One Web, One Program" as a slogan.
Doesn’t that sound like "Ein Volk, Ein Reich, Ein Führer" to you, too?
— Eric S. Raymond

Tell me what type of software do you like and what would you pay for it

http://www.daniweb.com/techtalkforums/thread19660.html
Reply With Quote Quick reply to this message  
Join Date: Jun 2005
Posts: 2,047
Reputation: Rashakil Fol is just really nice Rashakil Fol is just really nice Rashakil Fol is just really nice Rashakil Fol is just really nice 
Solved Threads: 139
Team Colleague
Rashakil Fol's Avatar
Rashakil Fol Rashakil Fol is offline Offline
Super Senior Demiposter

Re: find duplicates from a spreadsheet file

 
0
  #5
Jul 22nd, 2005
You can do this with a simple bit of spreadsheet manipulation. Just sort by whatever column you're checking for duplicates. Then, next to that column, insert a new blank one. Suppose that column you've sorted by is D, and the blank column is C. Suppose the data starts at the fifth row, or D5 (just to pick a number). Then in the blank column, in position C6, just type in the formula, =D5=D6. This will evaluate to TRUE if D6 is a duplicate of D5..

Then, with square C6 selected, click the bottom right corner of the bold rectangle (surrounding C6) and drag the selection down the entire column. The formula will copy itself, instead with =D6=D7, then =D7=D8, and so on.... You will see 'TRUE' wherever there is a duplicate, and you could find them quickly via Edit > Find....

If your definition of "duplicate" involves more than one equal column, you can make a trivial modification to your formula, something like =D6=D7 and E6=E7, or whatever syntax Excel uses.

You could also save the spreadsheet as a CSV file (comma separated values), write a CSV parser in Perl (or C++, but if you're familiar with Perl or Python or Ruby or ...., life is much easier), and search for duplicates in a small program. But it won't be as easy. I just had to do this exact thing at work (parse CSV files, find duplicates, but a bit trickier), and if you're willing to wait until Monday, I could send you a simple CSV-parsing Perl module (which is not guaranteed to work on all inputs, and would only be useful if you... knew Perl).

However, I recommend using the features of Excel.
Reply With Quote Quick reply to this message  
Join Date: Mar 2006
Posts: 4
Reputation: Preminition is an unknown quantity at this point 
Solved Threads: 0
Preminition Preminition is offline Offline
Newbie Poster

Re: find duplicates from a spreadsheet file

 
0
  #6
Mar 4th, 2006
You can try using auto filter in Excel and filter the columns in the excelsheet.
The ones that are equel are sequential under eachother.
Reply With Quote Quick reply to this message  
Join Date: May 2009
Posts: 2
Reputation: brendajean is an unknown quantity at this point 
Solved Threads: 0
brendajean brendajean is offline Offline
Newbie Poster

Re: find duplicates from a spreadsheet file

 
0
  #7
Jun 3rd, 2009
This is the easiest, quickest solution for duplicates ever! I used this with a spreadsheet with over 8,000 rows and, unlike using a macro, a formula does not care how many records it is applied to. I am a Data Assessment Coordinator for an elementary school district and end up with spreadsheets with thousands of students and hundreds of duplicates. I have created a cheat sheet with pictures for this and sent it out to others in my district who have had this problem. I have the formula set up to keep the first record found or the last, depending on the needs. As I always say when totally happy: My life is now complete! Thank you for the solution to the duplicate records issue.


Originally Posted by Rashakil Fol View Post
You can do this with a simple bit of spreadsheet manipulation. Just sort by whatever column you're checking for duplicates. Then, next to that column, insert a new blank one. Suppose that column you've sorted by is D, and the blank column is C. Suppose the data starts at the fifth row, or D5 (just to pick a number). Then in the blank column, in position C6, just type in the formula, =D5=D6. This will evaluate to TRUE if D6 is a duplicate of D5..

Then, with square C6 selected, click the bottom right corner of the bold rectangle (surrounding C6) and drag the selection down the entire column. The formula will copy itself, instead with =D6=D7, then =D7=D8, and so on.... You will see 'TRUE' wherever there is a duplicate, and you could find them quickly via Edit > Find....

If your definition of "duplicate" involves more than one equal column, you can make a trivial modification to your formula, something like =D6=D7 and E6=E7, or whatever syntax Excel uses.

You could also save the spreadsheet as a CSV file (comma separated values), write a CSV parser in Perl (or C++, but if you're familiar with Perl or Python or Ruby or ...., life is much easier), and search for duplicates in a small program. But it won't be as easy. I just had to do this exact thing at work (parse CSV files, find duplicates, but a bit trickier), and if you're willing to wait until Monday, I could send you a simple CSV-parsing Perl module (which is not guaranteed to work on all inputs, and would only be useful if you... knew Perl).

However, I recommend using the features of Excel.
Reply With Quote Quick reply to this message  
Join Date: Aug 2009
Posts: 1
Reputation: allancass is an unknown quantity at this point 
Solved Threads: 0
allancass allancass is offline Offline
Newbie Poster

Re: find duplicates from a spreadsheet file

 
0
  #8
Aug 6th, 2009
I'd like to recommend using this duplicate file finder to deal with unnecessary files. It has many functions, user-friendly interface, safety options and 100% efficiency.
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Similar Threads
Other Threads in the Computer Science Forum
Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC