Hello, Can you please help me in following scenario in Perl scripting? I want to compare two text files & save output of this comparision in third file with flags I-Insert, D-Delete, U-Update at the end of line.

File1.txt -
1|abc
2|efg
3|xyz

File2.txt
1|abc
2|efh
4|pqr

Expected output is - File3.txt
2|efh|C
3|xyz|D
4|pqr|I

#!/usr/bin/perl
use strict;
use warnings;

my ($file1, $file2, $file3) = qw(1.txt 2.txt 3.txt);
open my $fh1, '<', $file1 or die "Can't open $file1: $!";
open my $fh2, '<', $file2 or die "Can't open $file2: $!";
open my $fh3, '>', $file3 or die "Can't open $file3: $!";

while (<$fh1>){
    last if eof($fh2);
    my $comp_line = <$fh2>;
    chomp($_, $comp_line);
    my @rec1 = split /\|/;
    my @rec2 = split /\|/, $comp_line;
    
    #I-Insert, D-Delete, U-Update
    print $fh3 "$rec2[0]|$rec2[1]|U\n" if $rec2[0] eq $rec1[0] and $rec2[1] ne $rec1[1];
    print $fh3 "$rec1[0]|$rec1[1]|D\n" if $rec2[0] ne $rec1[0] and $rec2[1] ne $rec1[1];
    print $fh3 "$rec2[0]|$rec2[1]|I\n" if $rec2[0] ne $rec1[0];
}

Thank you very much David for solution. This is working perfectly fine above scenario. However, sometimes number of columns/fields are not fixed in the input file1 & file2. It could be more 10 or less. So, array indexing will not work in that case. Can you please suggest on following scenario?

In this example - I have third field of each input file is unique key column & I want to differences based on that key coulmn (red highlighted).
e.g.
File1 ->
1780437|20110705|00000077040000000000000048881|7704|48881|PE|08/12/2008 11:38:54|0|1000.00
1780437|20110705|00000077040000000000000048882|7704|48882|PE|08/12/2008 11:38:54|0|1000.00
1780437|20110705|00000077040000000000000048887|7704|48887|PE|08/11/2008 11:38:54|0|1000.00
1780437|20110705|00000077040000000000000048888|7704|48888|PE|08/12/2008 11:38:54|0|1000.00
File2.txt ->
1780437|20110705|00000077040000000000000048881|7704|48881|PE|08/12/2008 11:38:54|0|1000.00
1780437|20110705|00000077040000000000000048882|7704|48882|PE|08/12/2008 11:38:54|0|1000.00
1780437|20110705|00000077040000000000000048883|7704|48883|PE|10/01/2009 14:33:18|1|1000.00
1780437|20110705|00000077040000000000000048887|7704|48887|PE|08/11/2008 11:38:54|0|1001.00

Expected output ->

1780437|20110705|00000077040000000000000048883|7704|48883|PE|10/01/2009 14:33:18|1|1000.00|I
1780437|20110705|00000077040000000000000048887|7704|48887|PE|08/11/2008 11:38:54|0|1001.00|U
1780437|20110705|00000077040000000000000048888|7704|48888|PE|08/12/2008 11:38:54|0|1000.00|D

For this data I would suggest reading and saving the first file into a hash of hashes to save each record with a flag with value of 'D'. Then read through the second file to compare its records with the saved records and change the flag's value as needed.

#!/usr/bin/perl
use strict;
use warnings;

my ($file1, $file2, $file3) = qw(1.txt 2.txt 3.txt);
open my $fh1, '<', $file1 or die "Can't open $file1: $!";
open my $fh2, '<', $file2 or die "Can't open $file2: $!";
open my $fh3, '>', $file3 or die "Can't open $file3: $!";

my %save; #Hash of hashes to store records from file1 for comparison with file2
while (<$fh1>){
    chomp;
    my @rec = split /\|/;
    my $key = $rec[2];
    $save{$key}->{'data'} = $_; #Save current record in hash
    $save{$key}->{'flag'} = 'D';
}

while (<$fh2>){
    chomp;
    my @rec = split /\|/;
    my $key = $rec[2];
    
    if (not exists $save{$key}){
        $save{$key}->{'data'} = $_;
        $save{$key}->{'flag'} = 'I';
    }elsif ($_ ne $save{$key}->{'data'}){
        $save{$key}->{'data'} = $_;
        $save{$key}->{'flag'} = 'U';
    }else{
        delete $save{$key};
    }
}

foreach (sort keys %save){
    print $fh3 "$save{$_}->{'data'}|$save{$_}->{'flag'}\n";
}

Thanks David! This is working for me.

Also, I have realised one thing for flag 'U' that for find update flag - I don't need to consider first & second column. Since first & second columns are like timestamp & ID. So, for 'U' - comparision should start from third column (unique column).

So, I thinking to remove first two column at very initial step & before starting this comparision from both input files. Could please suggest any other approach here?

Thanks David! This is working for me.

Also, I have realised one thing for flag 'U' that for find update flag - I don't need to consider first & second column. Since first & second columns are like timestamp & ID. So, for 'U' - comparision should start from third column (unique column).

So, I thinking to remove first two column at very initial step & before starting this comparision from both input files. Could please suggest any other approach here?

In that case I would save the portion following the unique column in a separate element of the %save data called 'compare_this' and use it for comparing with the corresponding remainder of each record from file2.

#!/usr/bin/perl
use strict;
use warnings;

my ($file1, $file2, $file3) = qw(1.txt 2.txt 3.txt);
open my $fh1, '<', $file1 or die "Can't open $file1: $!";
open my $fh2, '<', $file2 or die "Can't open $file2: $!";
open my $fh3, '>', $file3 or die "Can't open $file3: $!";

my %save; #Hash of hashes to store records from file1 for comparison with file2
while (<$fh1>){
    chomp;
    my ($skip1, $skip2, $key, $remainder) = split(/\|/, $_, 4);
    $save{$key}->{'data'} = $_; #Save current record in hash
    $save{$key}->{'compare_this'} = $remainder; #Save last part of current record in hash
    $save{$key}->{'flag'} = 'D';
}

while (<$fh2>){
    chomp;
    my @rec = split /\|/;
    my ($skip1, $skip2, $key, $remainder) = split(/\|/, $_, 4);
    
    if (not exists $save{$key}){
        $save{$key}->{'data'} = $_;
        $save{$key}->{'flag'} = 'I';
    }elsif ($remainder ne $save{$key}->{'compare_this'}){
        $save{$key}->{'data'} = $_;
        $save{$key}->{'flag'} = 'U';
    }else{
        delete $save{$key};
    }
}

foreach (sort keys %save){
    print $fh3 "$save{$_}->{'data'}|$save{$_}->{'flag'}\n";
}

Instead of the above solution, you could simply write a function to return the portion of the record that you want to compare.

#!/usr/bin/perl
use strict;
use warnings;

my ($file1, $file2, $file3) = qw(1.txt 2.txt 3.txt);
open my $fh1, '<', $file1 or die "Can't open $file1: $!";
open my $fh2, '<', $file2 or die "Can't open $file2: $!";
open my $fh3, '>', $file3 or die "Can't open $file3: $!";

my %save; #Hash of hashes to store records from file1 for comparison with file2
while (<$fh1>){
    chomp;
    my @rec = split /\|/;
    my $key = $rec[2];
    $save{$key}->{'data'} = $_; #Save current record in hash
    $save{$key}->{'flag'} = 'D';
}

while (<$fh2>){
    chomp;
    my @rec = split /\|/;
    my $key = $rec[2];
    
    if (not exists $save{$key}){
        $save{$key}->{'data'} = $_;
        $save{$key}->{'flag'} = 'I';
    }elsif (string_to_compare($_) ne string_to_compare($save{$key}->{'data'})){
        $save{$key}->{'data'} = $_;
        $save{$key}->{'flag'} = 'U';
    }else{
        delete $save{$key};
    }
}

foreach (sort keys %save){
    print $fh3 "$save{$_}->{'data'}|$save{$_}->{'flag'}\n";
}

sub string_to_compare{
    my $line = shift;
    my ($skip1, $skip2, $key, $remainder) = split /\|/, $line, 4;
    return $remainder;
}
Comments
Nice coding sequence & derivation
For effort, even though the OP is getting a free ride

Thank you David! This is working perfectly fine for me.Thanks again!

Thank you David! This is working perfectly fine for me.Thanks again!

You are welcome. Please don't forget to mark this thread 'solved'.

I'm reopening this thread since I need few manipulation changes in order to optimize compare process.

Can someone suggest me how to handle sorted input this comparision process? Considering, I'm getting sorted input data from both input file. So, comparision should start from File 1: first record to File 2:first record or less than File1's first record. So, it will save the comparision time & would help in order to optimise the process. It won't required to check for all files's record since both input files are presorted.

Thank you very much in advance!

Edited 5 Years Ago by sandeepau: Optimization changes required

This article has been dead for over six months. Start a new discussion instead.