•
•
•
•
What is DaniWeb IT Discussion Community?
You're currently browsing the Perl section within the Software Development category of DaniWeb, a massive community of 427,202 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 2,235 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Perl advertiser: Programming Forums
Views: 1382 | Replies: 9
![]() |
•
•
Join Date: Oct 2007
Posts: 41
Reputation:
Rep Power: 1
Solved Threads: 0
Hi
I am trying to read two files and then copy their content to a third file while removing duplicates
please help
I am trying to read two files and then copy their content to a third file while removing duplicates
please help
#!/usr/bin/perl -w
{
open A,shift;
foreach (<A>)
{$a{$_}++};
open B,shift;
foreach (<B>)
{$b{$_}++};
open C,shift
foreach(<C>)
{
print unless $a{$_}
print unless $b{$_}
}
} •
•
Join Date: Oct 2007
Posts: 41
Reputation:
Rep Power: 1
Solved Threads: 0
I tried this one, but it isnt working
my ($file1, $file2) = @ARGV;
open (FILE1, $file1) or die "Can't open $file1: $!\n";
open (FILE2, $file2) or die "Can't open $file2: $!\n";
open (MERGE, ">merged") or die "Can't open merged file: $!\n";
my $line1 = <FILE1>;
my $line2 = <FILE2>;
while (defined ($line1) || defined ($line2)) {
if ($line1 =~ /^\#(\d+)\s+/) {
$tmp=$1;
$line1=<FILE1>;
}
if ($line1 =~ /^[io]\d+/){
print MERGE $line1;
$line1 = <FILE1>;
next;
}
@array= split /\s/, $line2;
$last= $array[$#array];
if ($last != $tmp) {
$line2 =<FILE2>;
next;
}
else {
print MERGE $line2;
$line2=<FILE2>;
}
} •
•
Join Date: Oct 2007
Posts: 41
Reputation:
Rep Power: 1
Solved Threads: 0
so there are two files
fileX
name1 account1 123
name2 account2 324
name3 account3 345
fileY
name1 account1 123
name2 account4 324
name5 account3 345
So I want output file to be like
outputfile
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345
NO duplicates record for nameX, they are overwritten in not particular any order/fashion.
fileX
name1 account1 123
name2 account2 324
name3 account3 345
fileY
name1 account1 123
name2 account4 324
name5 account3 345
So I want output file to be like
outputfile
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345
NO duplicates record for nameX, they are overwritten in not particular any order/fashion.
Perl Syntax (Toggle Plain Text)
use strict; use warnings; use FileHandle; # input files my $fh1 = FileHandle->new('f1.txt') or die "ERROR: $!"; my $fh2 = FileHandle->new('f2.txt') or die "ERROR: $!"; # output file my $fh3 = FileHandle->new(">f3.txt") or die "ERROR: $!"; while (!$fh1->eof() || !$fh2->eof()){ my $l1 = $fh1->getline() if (!$fh1->eof()); my $l2 = $fh2->getline() if (!$fh2->eof()); if ( (defined $l1) && (defined $l2)){ if ($l1 ne $l2){ $fh3->print($l1); $fh3->print($l2); } } else{ $fh3->print("\n".$l1) if (defined $l1); $fh3->print("\n".$l2) if (defined $l2); } } $fh1->close(); $fh2->close(); $fh3->close();
katharnakh.
challenge the limits
•
•
•
•
thanks but how can I make sure that there is no record duplication
so there can be different names in the first column(the other columns can be same/different)
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345
so the first colum is unique
name1, name2, name3, name4
Hello Mank,
...
name5 account3 345
name5 account3 445
...
then is this a duplicate? because first column is same. Well i forgot to consider this case. If this is a duplicate line then you have to split the line and compare result of first column to eliminate such lines.
katharnakh.
Last edited by katharnakh : Mar 26th, 2008 at 3:26 am.
challenge the limits
Yes for sure. Also you need to put lines of either file which has same values in column 1. I am reposting the code again, with specified modification.
katharnakh.
Perl Syntax (Toggle Plain Text)
use strict; use warnings; use FileHandle; # input files my $fh1 = FileHandle->new('f1.txt') or die "ERROR: $!"; my $fh2 = FileHandle->new('f2.txt') or die "ERROR: $!"; # output file my $fh3 = FileHandle->new(">f3.txt") or die "ERROR: $!"; while (!$fh1->eof() || !$fh2->eof()){ my $l1 = $fh1->getline() if (!$fh1->eof()); my $l2 = $fh2->getline() if (!$fh2->eof()); if ( (defined $l1) && (defined $l2)){ my @al1 = split(' ', $l1); my @al2 = split(' ', $l2); if ($al1[0] ne $al2[0]){ $fh3->print($l1); $fh3->print($l2); } else{ $fh3->print($l1); # add any one line from either of the file, to make sure we have every unique line from two file } } else{ # add extra lines of files $fh3->print("\n".$l1) if (defined $l1); $fh3->print("\n".$l2) if (defined $l2); } } $fh1->close(); $fh2->close(); $fh3->close();
#f1.txt name1 account1 123 name2 account2 324 name3 account3 345 name4 account3 345 name6 account3 345 ------------------------ #f2.txt name1 account1 123 name2 account4 324 name5 account3 345 ----------------------- #OUTPUT: f3.txt name1 account1 123 name2 account2 324 name3 account3 345 name5 account3 345 name4 account3 345 name6 account3 345
Last edited by katharnakh : Mar 26th, 2008 at 4:15 am.
challenge the limits
Your explnation is not very clear:
fileX
name1 account1 123
name2 account2 324
name3 account3 345
fileY
name1 account1 123
name2 account4 324
name5 account3 345
So I want output file to be like
outputfile
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345
The duplicate is "name2", but why did you keep "account2" instead of "account4"? In other words, when there is a duplicate of the first column, how do you know which line to keep and which one to not keep?
fileX
name1 account1 123
name2 account2 324
name3 account3 345
fileY
name1 account1 123
name2 account4 324
name5 account3 345
So I want output file to be like
outputfile
name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345
The duplicate is "name2", but why did you keep "account2" instead of "account4"? In other words, when there is a duplicate of the first column, how do you know which line to keep and which one to not keep?
![]() |
•
•
•
•
•
•
•
•
DaniWeb Perl Marketplace
•
•
•
•
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
Other Threads in the Perl Forum
- Previous Thread: CHARLINT
- Next Thread: print, number of character from a file


Linear Mode