User Name Password Register
DaniWeb IT Discussion Community
All
What is DaniWeb IT Discussion Community?
You're currently browsing the Perl section within the Software Development category of DaniWeb, a massive community of 402,080 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 2,518 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Perl advertiser: Programming Forums
Views: 1258 | Replies: 9
Reply
Join Date: Oct 2007
Posts: 41
Reputation: mank is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 0
mank mank is offline Offline
Light Poster

merging two files and removing duplicates

  #1  
Mar 25th, 2008
Hi
I am trying to read two files and then copy their content to a third file while removing duplicates
please help

#!/usr/bin/perl -w
{
open A,shift;   

foreach (<A>)   
{$a{$_}++};     
               
open B,shift;  
foreach (<B>)
{$b{$_}++};  

open C,shift
foreach(<C>)
{
print unless $a{$_}
print unless $b{$_} 
}
}    
AddThis Social Bookmark Button
Reply With Quote  
Join Date: Oct 2007
Posts: 41
Reputation: mank is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 0
mank mank is offline Offline
Light Poster

Re: merging two files and removing duplicates

  #2  
Mar 25th, 2008
I tried this one, but it isnt working

my ($file1, $file2) = @ARGV;

open (FILE1, $file1) or die "Can't open $file1: $!\n";
open (FILE2, $file2) or die "Can't open $file2: $!\n";
open (MERGE, ">merged") or die "Can't open merged file: $!\n";

my $line1 = <FILE1>;  
my $line2 = <FILE2>;

while (defined ($line1) || defined ($line2)) {
     if ($line1 =~ /^\#(\d+)\s+/) {
        $tmp=$1;
$line1=<FILE1>;
     }

     if ($line1 =~ /^[io]\d+/){
          print MERGE $line1;
          $line1 = <FILE1>;
          next;
     }  

     @array= split /\s/, $line2;
     $last= $array[$#array];  
      if ($last !=  $tmp) {
         $line2 =<FILE2>;
         next;
     }
     else {
print MERGE $line2;
         $line2=<FILE2>;
     }
} 
Reply With Quote  
Join Date: Jan 2006
Posts: 218
Reputation: katharnakh is an unknown quantity at this point 
Rep Power: 3
Solved Threads: 19
katharnakh's Avatar
katharnakh katharnakh is offline Offline
Posting Whiz in Training

Re: merging two files and removing duplicates

  #3  
Mar 25th, 2008
what is not working? what output you get? Explain your problem clearly. With the given code, and no input details and desired output details, how do one can determine what is happening?

katharnakh.
challenge the limits
Reply With Quote  
Join Date: Oct 2007
Posts: 41
Reputation: mank is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 0
mank mank is offline Offline
Light Poster

Re: merging two files and removing duplicates

  #4  
Mar 25th, 2008
so there are two files
fileX
name1 account1 123
name2 account2 324
name3 account3 345

fileY
name1 account1 123
name2 account4 324
name5 account3 345

So I want output file to be like
outputfile

name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

NO duplicates record for nameX, they are overwritten in not particular any order/fashion.
Reply With Quote  
Join Date: Jan 2006
Posts: 218
Reputation: katharnakh is an unknown quantity at this point 
Rep Power: 3
Solved Threads: 19
katharnakh's Avatar
katharnakh katharnakh is offline Offline
Posting Whiz in Training

Re: merging two files and removing duplicates

  #5  
Mar 25th, 2008
  1. use strict;
  2. use warnings;
  3.  
  4. use FileHandle;
  5.  
  6. # input files
  7. my $fh1 = FileHandle->new('f1.txt') or die "ERROR: $!";
  8. my $fh2 = FileHandle->new('f2.txt') or die "ERROR: $!";
  9.  
  10. # output file
  11. my $fh3 = FileHandle->new(">f3.txt") or die "ERROR: $!";
  12.  
  13. while (!$fh1->eof() || !$fh2->eof()){
  14. my $l1 = $fh1->getline() if (!$fh1->eof());
  15. my $l2 = $fh2->getline() if (!$fh2->eof());
  16.  
  17. if ( (defined $l1) && (defined $l2)){
  18. if ($l1 ne $l2){
  19. $fh3->print($l1);
  20. $fh3->print($l2);
  21. }
  22. }
  23. else{
  24. $fh3->print("\n".$l1) if (defined $l1);
  25. $fh3->print("\n".$l2) if (defined $l2);
  26. }
  27. }
  28.  
  29. $fh1->close();
  30. $fh2->close();
  31. $fh3->close();

katharnakh.
challenge the limits
Reply With Quote  
Join Date: Oct 2007
Posts: 41
Reputation: mank is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 0
mank mank is offline Offline
Light Poster

Re: merging two files and removing duplicates

  #6  
Mar 26th, 2008
thanks but how can I make sure that there is no record duplication

so there can be different names in the first column(the other columns can be same/different)
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

so the first colum is unique
name1, name2, name3, name4
Reply With Quote  
Join Date: Jan 2006
Posts: 218
Reputation: katharnakh is an unknown quantity at this point 
Rep Power: 3
Solved Threads: 19
katharnakh's Avatar
katharnakh katharnakh is offline Offline
Posting Whiz in Training

Re: merging two files and removing duplicates

  #7  
Mar 26th, 2008
Originally Posted by mank View Post
thanks but how can I make sure that there is no record duplication

so there can be different names in the first column(the other columns can be same/different)
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

so the first colum is unique
name1, name2, name3, name4

Hello Mank,
...
name5 account3 345
name5 account3 445
...
then is this a duplicate? because first column is same. Well i forgot to consider this case. If this is a duplicate line then you have to split the line and compare result of first column to eliminate such lines.

katharnakh.
Last edited by katharnakh : Mar 26th, 2008 at 3:26 am.
challenge the limits
Reply With Quote  
Join Date: Oct 2007
Posts: 41
Reputation: mank is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 0
mank mank is offline Offline
Light Poster

Re: merging two files and removing duplicates

  #8  
Mar 26th, 2008
can I use something like
@a = split /\t/, $_; ?
and then compare it everytime ?
Reply With Quote  
Join Date: Jan 2006
Posts: 218
Reputation: katharnakh is an unknown quantity at this point 
Rep Power: 3
Solved Threads: 19
katharnakh's Avatar
katharnakh katharnakh is offline Offline
Posting Whiz in Training

Re: merging two files and removing duplicates

  #9  
Mar 26th, 2008
Yes for sure. Also you need to put lines of either file which has same values in column 1. I am reposting the code again, with specified modification.
  1. use strict;
  2. use warnings;
  3.  
  4. use FileHandle;
  5.  
  6. # input files
  7. my $fh1 = FileHandle->new('f1.txt') or die "ERROR: $!";
  8. my $fh2 = FileHandle->new('f2.txt') or die "ERROR: $!";
  9.  
  10. # output file
  11. my $fh3 = FileHandle->new(">f3.txt") or die "ERROR: $!";
  12.  
  13. while (!$fh1->eof() || !$fh2->eof()){
  14. my $l1 = $fh1->getline() if (!$fh1->eof());
  15. my $l2 = $fh2->getline() if (!$fh2->eof());
  16.  
  17. if ( (defined $l1) && (defined $l2)){
  18. my @al1 = split(' ', $l1);
  19. my @al2 = split(' ', $l2);
  20.  
  21. if ($al1[0] ne $al2[0]){
  22. $fh3->print($l1);
  23. $fh3->print($l2);
  24. }
  25. else{
  26. $fh3->print($l1); # add any one line from either of the file, to make sure we have every unique line from two file
  27. }
  28. }
  29. else{
  30. # add extra lines of files
  31. $fh3->print("\n".$l1) if (defined $l1);
  32. $fh3->print("\n".$l2) if (defined $l2);
  33. }
  34. }
  35.  
  36. $fh1->close();
  37. $fh2->close();
  38. $fh3->close();

#f1.txt
name1 account1 123
name2 account2 324
name3 account3 345
name4 account3 345
name6 account3 345
------------------------
#f2.txt
name1 account1 123
name2 account4 324
name5 account3 345
-----------------------
#OUTPUT: f3.txt
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

name4 account3 345

name6 account3 345
katharnakh.
Last edited by katharnakh : Mar 26th, 2008 at 4:15 am.
challenge the limits
Reply With Quote  
Join Date: Mar 2006
Posts: 596
Reputation: KevinADC is an unknown quantity at this point 
Rep Power: 4
Solved Threads: 31
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Posting Pro

Re: merging two files and removing duplicates

  #10  
Mar 26th, 2008
Your explnation is not very clear:

fileX
name1 account1 123
name2 account2 324
name3 account3 345

fileY
name1 account1 123
name2 account4 324
name5 account3 345

So I want output file to be like
outputfile

name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

The duplicate is "name2", but why did you keep "account2" instead of "account4"? In other words, when there is a duplicate of the first column, how do you know which line to keep and which one to not keep?
Reply With Quote  
Reply

Only community members can participate in forum threads. You must register or log in to contribute.

DaniWeb Perl Marketplace
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 

Thread Tools Display Modes

Other Threads in the Perl Forum

All times are GMT -4. The time now is 1:20 am.
Forum system based on vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC