Hi
I am trying to read two files and then copy their content to a third file while removing duplicates
please help

#!/usr/bin/perl -w
{
open A,shift;   

foreach (<A>)   
{$a{$_}++};     
               
open B,shift;  
foreach (<B>)
{$b{$_}++};  

open C,shift
foreach(<C>)
{
print unless $a{$_}
print unless $b{$_} 
}
}

I tried this one, but it isnt working

my ($file1, $file2) = @ARGV;

open (FILE1, $file1) or die "Can't open $file1: $!\n";
open (FILE2, $file2) or die "Can't open $file2: $!\n";
open (MERGE, ">merged") or die "Can't open merged file: $!\n";

my $line1 = <FILE1>;  
my $line2 = <FILE2>;

while (defined ($line1) || defined ($line2)) {
     if ($line1 =~ /^\#(\d+)\s+/) {
        $tmp=$1;
$line1=<FILE1>;
     }

     if ($line1 =~ /^[io]\d+/){
          print MERGE $line1;
          $line1 = <FILE1>;
          next;
     }  

     @array= split /\s/, $line2;
     $last= $array[$#array];  
      if ($last !=  $tmp) {
         $line2 =<FILE2>;
         next;
     }
     else {
print MERGE $line2;
         $line2=<FILE2>;
     }
}

what is not working? what output you get? Explain your problem clearly. With the given code, and no input details and desired output details, how do one can determine what is happening?

katharnakh.

so there are two files
fileX
name1 account1 123
name2 account2 324
name3 account3 345

fileY
name1 account1 123
name2 account4 324
name5 account3 345

So I want output file to be like
outputfile

name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

NO duplicates record for nameX, they are overwritten in not particular any order/fashion.

use strict;
use warnings;

use FileHandle;

# input files
my $fh1 = FileHandle->new('f1.txt') or die "ERROR: $!";
my $fh2 = FileHandle->new('f2.txt') or die "ERROR: $!";

# output file
my $fh3 = FileHandle->new(">f3.txt") or die "ERROR: $!";

while (!$fh1->eof() || !$fh2->eof()){
		my $l1 = $fh1->getline() if (!$fh1->eof()); 
		my $l2 = $fh2->getline() if (!$fh2->eof());
		
		if ( (defined $l1) && (defined $l2)){
				if ($l1 ne $l2){
						$fh3->print($l1);
						$fh3->print($l2);	
				}
		}
		else{
				$fh3->print("\n".$l1) if (defined $l1);
				$fh3->print("\n".$l2) if (defined $l2);
		}
}

$fh1->close();
$fh2->close();
$fh3->close();

katharnakh.

thanks but how can I make sure that there is no record duplication

so there can be different names in the first column(the other columns can be same/different)
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

so the first colum is unique
name1, name2, name3, name4

thanks but how can I make sure that there is no record duplication

so there can be different names in the first column(the other columns can be same/different)
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

so the first colum is unique
name1, name2, name3, name4

Hello Mank,
...
name5 account3 345
name5 account3 445
...
then is this a duplicate? because first column is same. Well i forgot to consider this case. If this is a duplicate line then you have to split the line and compare result of first column to eliminate such lines.

katharnakh.

can I use something like
@a = split /\t/, $_; ?
and then compare it everytime ?

Yes for sure. Also you need to put lines of either file which has same values in column 1. I am reposting the code again, with specified modification.

use strict;
use warnings;

use FileHandle;

# input files
my $fh1 = FileHandle->new('f1.txt') or die "ERROR: $!";
my $fh2 = FileHandle->new('f2.txt') or die "ERROR: $!";

# output file
my $fh3 = FileHandle->new(">f3.txt") or die "ERROR: $!";

while (!$fh1->eof() || !$fh2->eof()){
		my $l1 = $fh1->getline() if (!$fh1->eof()); 
		my $l2 = $fh2->getline() if (!$fh2->eof());
		
		if ( (defined $l1) && (defined $l2)){
				my @al1 = split(' ', $l1);
				my @al2 = split(' ', $l2);
				
				if ($al1[0] ne $al2[0]){
						$fh3->print($l1);
						$fh3->print($l2);	
				}
				else{
						$fh3->print($l1);				# add any one line from either of the file, to make sure we have every unique line from two file
				}
		}
		else{
				# add extra lines of files
				$fh3->print("\n".$l1) if (defined $l1);
				$fh3->print("\n".$l2) if (defined $l2);
		}
}

$fh1->close();
$fh2->close();
$fh3->close();
#f1.txt
name1 account1 123
name2 account2 324
name3 account3 345
name4 account3 345
name6 account3 345
------------------------
#f2.txt
name1 account1 123
name2 account4 324
name5 account3 345
-----------------------
#OUTPUT: f3.txt
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

name4 account3 345

name6 account3 345

katharnakh.

Your explnation is not very clear:

fileX
name1 account1 123
name2 account2 324
name3 account3 345

fileY
name1 account1 123
name2 account4 324
name5 account3 345

So I want output file to be like
outputfile

name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

The duplicate is "name2", but why did you keep "account2" instead of "account4"? In other words, when there is a duplicate of the first column, how do you know which line to keep and which one to not keep?

This article has been dead for over six months. Start a new discussion instead.