merging two files and removing duplicates

Question

mank 0 Light Poster

17 Years Ago

Hi
I am trying to read two files and then copy their content to a third file while removing duplicates
please help

#!/usr/bin/perl -w
{
open A,shift;   

foreach (<A>)   
{$a{$_}++};     
               
open B,shift;  
foreach (<B>)
{$b{$_}++};  

open C,shift
foreach(<C>)
{
print unless $a{$_}
print unless $b{$_} 
}
}

perl

3 Contributors
9 Replies
390 Views
1 Day Discussion Span
Latest Post 17 Years Ago Latest Post by KevinADC

All 9 Replies

katharnakh 7 Posting Whiz in Training

17 Years Ago

what is not working? what output you get? Explain your problem clearly. With the given code, and no input details and desired output details, how do one can determine what is happening?

katharnakh.

katharnakh 7 Posting Whiz in Training

17 Years Ago

use strict;
use warnings;

use FileHandle;

# input files
my $fh1 = FileHandle->new('f1.txt') or die "ERROR: $!";
my $fh2 = FileHandle->new('f2.txt') or die "ERROR: $!";

# output file
my $fh3 = FileHandle->new(">f3.txt") or die "ERROR: $!";

while (!$fh1->eof() || !$fh2->eof()){
		my $l1 = $fh1->getline() if (!$fh1->eof()); 
		my $l2 = $fh2->getline() if (!$fh2->eof());
		
		if ( (defined $l1) && (defined $l2)){
				if ($l1 ne $l2){
						$fh3->print($l1);
						$fh3->print($l2);	
				}
		}
		else{
				$fh3->print("\n".$l1) if (defined $l1);
				$fh3->print("\n".$l2) if (defined $l2);
		}
}

$fh1->close();
$fh2->close();
$fh3->close();

katharnakh.

katharnakh 7 Posting Whiz in Training

17 Years Ago

thanks but how can I make sure that there is no record duplication
so there can be different names in the first column(the other columns can be same/different)
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345
so the first colum is unique
name1, name2, name3, name4

Hello Mank,
...
name5 account3 345
name5 account3 445
...
then is this a duplicate? because first column is same. Well i forgot to consider this case. If this is a duplicate line then you have to split the line and compare result of first column to eliminate such lines.

katharnakh.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

mank 0 Light Poster · Answer 1 · 2008-03-25T12:12:03+00:00

I tried this one, but it isnt working

my ($file1, $file2) = @ARGV;

open (FILE1, $file1) or die "Can't open $file1: $!\n";
open (FILE2, $file2) or die "Can't open $file2: $!\n";
open (MERGE, ">merged") or die "Can't open merged file: $!\n";

my $line1 = <FILE1>;  
my $line2 = <FILE2>;

while (defined ($line1) || defined ($line2)) {
     if ($line1 =~ /^\#(\d+)\s+/) {
        $tmp=$1;
$line1=<FILE1>;
     }

     if ($line1 =~ /^[io]\d+/){
          print MERGE $line1;
          $line1 = <FILE1>;
          next;
     }  

     @array= split /\s/, $line2;
     $last= $array[$#array];  
      if ($last !=  $tmp) {
         $line2 =<FILE2>;
         next;
     }
     else {
print MERGE $line2;
         $line2=<FILE2>;
     }
}

mank 0 Light Poster · Answer 2 · 2008-03-25T16:55:20+00:00

so there are two files
fileX
name1 account1 123
name2 account2 324
name3 account3 345

fileY
name1 account1 123
name2 account4 324
name5 account3 345

So I want output file to be like
outputfile

name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

NO duplicates record for nameX, they are overwritten in not particular any order/fashion.

mank 0 Light Poster · Answer 3 · 2008-03-26T10:10:17+00:00

thanks but how can I make sure that there is no record duplication

so there can be different names in the first column(the other columns can be same/different)
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

so the first colum is unique
name1, name2, name3, name4

mank 0 Light Poster · Answer 4 · 2008-03-26T13:39:34+00:00

can I use something like
@a = split /\t/, $_; ?
and then compare it everytime ?

katharnakh 7 Posting Whiz in Training · Answer 5 · 2008-03-26T14:13:47+00:00

Yes for sure. Also you need to put lines of either file which has same values in column 1. I am reposting the code again, with specified modification.

use strict;
use warnings;

use FileHandle;

# input files
my $fh1 = FileHandle->new('f1.txt') or die "ERROR: $!";
my $fh2 = FileHandle->new('f2.txt') or die "ERROR: $!";

# output file
my $fh3 = FileHandle->new(">f3.txt") or die "ERROR: $!";

while (!$fh1->eof() || !$fh2->eof()){
		my $l1 = $fh1->getline() if (!$fh1->eof()); 
		my $l2 = $fh2->getline() if (!$fh2->eof());
		
		if ( (defined $l1) && (defined $l2)){
				my @al1 = split(' ', $l1);
				my @al2 = split(' ', $l2);
				
				if ($al1[0] ne $al2[0]){
						$fh3->print($l1);
						$fh3->print($l2);	
				}
				else{
						$fh3->print($l1);				# add any one line from either of the file, to make sure we have every unique line from two file
				}
		}
		else{
				# add extra lines of files
				$fh3->print("\n".$l1) if (defined $l1);
				$fh3->print("\n".$l2) if (defined $l2);
		}
}

$fh1->close();
$fh2->close();
$fh3->close();

#f1.txt
name1 account1 123
name2 account2 324
name3 account3 345
name4 account3 345
name6 account3 345
------------------------
#f2.txt
name1 account1 123
name2 account4 324
name5 account3 345
-----------------------
#OUTPUT: f3.txt
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

name4 account3 345

name6 account3 345

katharnakh.

KevinADC 192 Practically a Posting Shark · Answer 6 · 2008-03-27T00:44:14+00:00

Your explnation is not very clear:

fileX
name1 account1 123
name2 account2 324
name3 account3 345

fileY
name1 account1 123
name2 account4 324
name5 account3 345

So I want output file to be like
outputfile

name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

The duplicate is "name2", but why did you keep "account2" instead of "account4"? In other words, when there is a duplicate of the first column, how do you know which line to keep and which one to not keep?

merging two files and removing duplicates

Recommended Answers Collapse Answers

All 9 Replies

Recommended Answers