merging two files and removing duplicates

Please support our Perl advertiser: Programming Forums - DaniWeb Sister Site
Reply

Join Date: Oct 2007
Posts: 41
Reputation: mank is an unknown quantity at this point 
Solved Threads: 0
mank mank is offline Offline
Light Poster

merging two files and removing duplicates

 
0
  #1
Mar 25th, 2008
Hi
I am trying to read two files and then copy their content to a third file while removing duplicates
please help

  1. #!/usr/bin/perl -w
  2. {
  3. open A,shift;
  4.  
  5. foreach (<A>)
  6. {$a{$_}++};
  7.  
  8. open B,shift;
  9. foreach (<B>)
  10. {$b{$_}++};
  11.  
  12. open C,shift
  13. foreach(<C>)
  14. {
  15. print unless $a{$_}
  16. print unless $b{$_}
  17. }
  18. }
Reply With Quote Quick reply to this message  
Join Date: Oct 2007
Posts: 41
Reputation: mank is an unknown quantity at this point 
Solved Threads: 0
mank mank is offline Offline
Light Poster

Re: merging two files and removing duplicates

 
0
  #2
Mar 25th, 2008
I tried this one, but it isnt working

  1. my ($file1, $file2) = @ARGV;
  2.  
  3. open (FILE1, $file1) or die "Can't open $file1: $!\n";
  4. open (FILE2, $file2) or die "Can't open $file2: $!\n";
  5. open (MERGE, ">merged") or die "Can't open merged file: $!\n";
  6.  
  7. my $line1 = <FILE1>;
  8. my $line2 = <FILE2>;
  9.  
  10. while (defined ($line1) || defined ($line2)) {
  11. if ($line1 =~ /^\#(\d+)\s+/) {
  12. $tmp=$1;
  13. $line1=<FILE1>;
  14. }
  15.  
  16. if ($line1 =~ /^[io]\d+/){
  17. print MERGE $line1;
  18. $line1 = <FILE1>;
  19. next;
  20. }
  21.  
  22. @array= split /\s/, $line2;
  23. $last= $array[$#array];
  24. if ($last != $tmp) {
  25. $line2 =<FILE2>;
  26. next;
  27. }
  28. else {
  29. print MERGE $line2;
  30. $line2=<FILE2>;
  31. }
  32. }
Reply With Quote Quick reply to this message  
Join Date: Jan 2006
Posts: 237
Reputation: katharnakh is an unknown quantity at this point 
Solved Threads: 33
katharnakh's Avatar
katharnakh katharnakh is offline Offline
Posting Whiz in Training

Re: merging two files and removing duplicates

 
0
  #3
Mar 25th, 2008
what is not working? what output you get? Explain your problem clearly. With the given code, and no input details and desired output details, how do one can determine what is happening?

katharnakh.
Reply With Quote Quick reply to this message  
Join Date: Oct 2007
Posts: 41
Reputation: mank is an unknown quantity at this point 
Solved Threads: 0
mank mank is offline Offline
Light Poster

Re: merging two files and removing duplicates

 
0
  #4
Mar 25th, 2008
so there are two files
fileX
name1 account1 123
name2 account2 324
name3 account3 345

fileY
name1 account1 123
name2 account4 324
name5 account3 345

So I want output file to be like
outputfile

name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

NO duplicates record for nameX, they are overwritten in not particular any order/fashion.
Reply With Quote Quick reply to this message  
Join Date: Jan 2006
Posts: 237
Reputation: katharnakh is an unknown quantity at this point 
Solved Threads: 33
katharnakh's Avatar
katharnakh katharnakh is offline Offline
Posting Whiz in Training

Re: merging two files and removing duplicates

 
0
  #5
Mar 25th, 2008
  1. use strict;
  2. use warnings;
  3.  
  4. use FileHandle;
  5.  
  6. # input files
  7. my $fh1 = FileHandle->new('f1.txt') or die "ERROR: $!";
  8. my $fh2 = FileHandle->new('f2.txt') or die "ERROR: $!";
  9.  
  10. # output file
  11. my $fh3 = FileHandle->new(">f3.txt") or die "ERROR: $!";
  12.  
  13. while (!$fh1->eof() || !$fh2->eof()){
  14. my $l1 = $fh1->getline() if (!$fh1->eof());
  15. my $l2 = $fh2->getline() if (!$fh2->eof());
  16.  
  17. if ( (defined $l1) && (defined $l2)){
  18. if ($l1 ne $l2){
  19. $fh3->print($l1);
  20. $fh3->print($l2);
  21. }
  22. }
  23. else{
  24. $fh3->print("\n".$l1) if (defined $l1);
  25. $fh3->print("\n".$l2) if (defined $l2);
  26. }
  27. }
  28.  
  29. $fh1->close();
  30. $fh2->close();
  31. $fh3->close();

katharnakh.
Reply With Quote Quick reply to this message  
Join Date: Oct 2007
Posts: 41
Reputation: mank is an unknown quantity at this point 
Solved Threads: 0
mank mank is offline Offline
Light Poster

Re: merging two files and removing duplicates

 
0
  #6
Mar 26th, 2008
thanks but how can I make sure that there is no record duplication

so there can be different names in the first column(the other columns can be same/different)
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

so the first colum is unique
name1, name2, name3, name4
Reply With Quote Quick reply to this message  
Join Date: Jan 2006
Posts: 237
Reputation: katharnakh is an unknown quantity at this point 
Solved Threads: 33
katharnakh's Avatar
katharnakh katharnakh is offline Offline
Posting Whiz in Training

Re: merging two files and removing duplicates

 
0
  #7
Mar 26th, 2008
Originally Posted by mank View Post
thanks but how can I make sure that there is no record duplication

so there can be different names in the first column(the other columns can be same/different)
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

so the first colum is unique
name1, name2, name3, name4
Hello Mank,
...
name5 account3 345
name5 account3 445
...
then is this a duplicate? because first column is same. Well i forgot to consider this case. If this is a duplicate line then you have to split the line and compare result of first column to eliminate such lines.

katharnakh.
Last edited by katharnakh; Mar 26th, 2008 at 4:26 am.
Reply With Quote Quick reply to this message  
Join Date: Oct 2007
Posts: 41
Reputation: mank is an unknown quantity at this point 
Solved Threads: 0
mank mank is offline Offline
Light Poster

Re: merging two files and removing duplicates

 
0
  #8
Mar 26th, 2008
can I use something like
@a = split /\t/, $_; ?
and then compare it everytime ?
Reply With Quote Quick reply to this message  
Join Date: Jan 2006
Posts: 237
Reputation: katharnakh is an unknown quantity at this point 
Solved Threads: 33
katharnakh's Avatar
katharnakh katharnakh is offline Offline
Posting Whiz in Training

Re: merging two files and removing duplicates

 
0
  #9
Mar 26th, 2008
Yes for sure. Also you need to put lines of either file which has same values in column 1. I am reposting the code again, with specified modification.
  1. use strict;
  2. use warnings;
  3.  
  4. use FileHandle;
  5.  
  6. # input files
  7. my $fh1 = FileHandle->new('f1.txt') or die "ERROR: $!";
  8. my $fh2 = FileHandle->new('f2.txt') or die "ERROR: $!";
  9.  
  10. # output file
  11. my $fh3 = FileHandle->new(">f3.txt") or die "ERROR: $!";
  12.  
  13. while (!$fh1->eof() || !$fh2->eof()){
  14. my $l1 = $fh1->getline() if (!$fh1->eof());
  15. my $l2 = $fh2->getline() if (!$fh2->eof());
  16.  
  17. if ( (defined $l1) && (defined $l2)){
  18. my @al1 = split(' ', $l1);
  19. my @al2 = split(' ', $l2);
  20.  
  21. if ($al1[0] ne $al2[0]){
  22. $fh3->print($l1);
  23. $fh3->print($l2);
  24. }
  25. else{
  26. $fh3->print($l1); # add any one line from either of the file, to make sure we have every unique line from two file
  27. }
  28. }
  29. else{
  30. # add extra lines of files
  31. $fh3->print("\n".$l1) if (defined $l1);
  32. $fh3->print("\n".$l2) if (defined $l2);
  33. }
  34. }
  35.  
  36. $fh1->close();
  37. $fh2->close();
  38. $fh3->close();

  1. #f1.txt
  2. name1 account1 123
  3. name2 account2 324
  4. name3 account3 345
  5. name4 account3 345
  6. name6 account3 345
  7. ------------------------
  8. #f2.txt
  9. name1 account1 123
  10. name2 account4 324
  11. name5 account3 345
  12. -----------------------
  13. #OUTPUT: f3.txt
  14. name1 account1 123
  15. name2 account2 324
  16. name3 account3 345
  17. name5 account3 345
  18.  
  19. name4 account3 345
  20.  
  21. name6 account3 345
katharnakh.
Last edited by katharnakh; Mar 26th, 2008 at 5:15 am.
Reply With Quote Quick reply to this message  
Join Date: Mar 2006
Posts: 898
Reputation: KevinADC has a spectacular aura about KevinADC has a spectacular aura about 
Solved Threads: 67
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Practically a Posting Shark

Re: merging two files and removing duplicates

 
0
  #10
Mar 26th, 2008
Your explnation is not very clear:

fileX
name1 account1 123
name2 account2 324
name3 account3 345

fileY
name1 account1 123
name2 account4 324
name5 account3 345

So I want output file to be like
outputfile

name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

The duplicate is "name2", but why did you keep "account2" instead of "account4"? In other words, when there is a duplicate of the first column, how do you know which line to keep and which one to not keep?
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Other Threads in the Perl Forum
Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC