Compare 2 files and output result

Question

vivek.vivek 0 Light Poster

11 Years Ago

Hi,

I have 2 different files as below;

FILE1.txt:

FILE F1 abc 2011.12 31-oct-2012 35
FILE F2 abc 2011.12 31-oct-2012 40
FILE F4 abc 2011.12 31-oct-2012 2

FILE2.txt:

FILE F1 abc 2011.12 31-oct-2012 25
FILE F2 abc 2011.12 31-jun-2013 40
FILE F3 abc 2011.12 31-jan-2014 4

In my 2nd file, a line is added/removed and some changes in the column value of line 1 and 2

I want to compare both the file and output the difference as below.

Old file: FILE1.txt
New file: FILE1.txt






F Name                             Version   Old  New       Old         New
===========================================  ==== ====   =========== ===========
F1                                2011.12       35    25   31-oct-2012      NONE
F2                                2011.12       40    40   31-oct-2012      31-jun-2013
F3                                2011.12       --    04   NONE             31-jun-2014
F4                                2011.12       02    --   31-oct-2012      NONE

Missing in New file
F4                                2011.12       02    31-oct-2012

Missing in Old File
F3                                2011.12       04    31-jun-2014

Please help.

perl

2 Contributors
11 Replies
332 Views
1 Week Discussion Span
Latest Post 11 Years Ago Latest Post by d5e5

All 11 Replies

d5e5 109 Master Poster

11 Years Ago

Start by reading both files into a suitable data structure (see the data structure cookbook). Then you can read and test values from your data structure to print the details of your report.

#!/usr/bin/perl
use strict;
use warnings;

@ARGV = qw(FILE1.txt FILE2.txt);

my %HoH; #Hash of hash references
while (my $line = <>){
    chomp $line;
    my $oldnew;
    if ($ARGV eq 'FILE1.txt'){
        $oldnew = 'old';
    }
    else{
        $oldnew = 'new';
    }
    my @fields = split /\s+/, $line;
    my $key = $fields[1];
    $HoH{$key} = {} if not exists $HoH{$fields[1]};
    $HoH{$key}{'version'} = $fields[3];
    $HoH{$key}{$oldnew}{'date'} = $fields[4];
    $HoH{$key}{$oldnew}{'num'} = $fields[5];
}
#use Data::Dumper;
#print Dumper(\%HoH);

foreach my $k (sort keys %HoH){
    my $ver = $HoH{$k}{'version'};
    my $o_num = exists $HoH{$k}{'old'}{'num'} ? $HoH{$k}{'old'}{'num'}: 'NONE';
    my $n_num = exists $HoH{$k}{'new'}{'num'} ? $HoH{$k}{'new'}{'num'}: 'NONE';
    my $o_dt = exists $HoH{$k}{'old'}{'date'} ? $HoH{$k}{'old'}{'date'}: 'NONE';
    my $n_dt = exists $HoH{$k}{'new'}{'date'} ? $HoH{$k}{'new'}{'date'}: 'NONE';
    printf "%s\t%s\t%s\t%s\t%s\t%s\n", $k,$ver,$o_num,$n_num,$o_dt,$n_dt;
}

d5e5 109 Master Poster

11 Years Ago

If both files will always be in the same order you can always compare line 2 in one file with line 2 in the other, line 3 with line 3, etc.

Considering that the line begin with "FILE" or "LINE" string always

I don't see any lines begining with a "LINE" string. You know your data better than I do. You need to make some assumptions about the layout of the data that determine what columns to compare with each other.

Edited 11 Years Ago by d5e5

d5e5 109 Master Poster

11 Years Ago

Skip the lines that don't start with FILE or LINE by putting next unless $line =~ m/^FILE|LINE/;#Skip if doesn't start with FILE or LINE near the start of the loop that reads the files. The data you posted above appears to have a lot of spaces or tabs at the start of each line. It's better to post data as file attachments so we can see what's really in the files. The following script seems to work OK for me.

#!/usr/bin/perl
use strict;
use warnings;
@ARGV = qw(FILE1.txt FILE2.txt);
my %HoH; #Hash of hash references
while (my $line = <>){
    $line =~ s/^\s+//;#Remove space characters and tabs (if any) from start of line
    chomp $line;
    next unless $line =~ m/^FILE|LINE/;#Skip if doesn't start with FILE or LINE

    my $oldnew;
    if ($ARGV eq 'FILE1.txt'){
        $oldnew = 'old';
    }
    else{
        $oldnew = 'new';
    }
    my @fields = split /\s+/, $line;
    my $key = $fields[1];
    $HoH{$key} = {} if not exists $HoH{$fields[1]};
    $HoH{$key}{$oldnew}{'F3'} = $fields[2];
    $HoH{$key}{$oldnew}{'F4'} = $fields[3];
    $HoH{$key}{$oldnew}{'F5'} = $fields[4];
    $HoH{$key}{$oldnew}{'F6'} = $fields[5];
}

foreach my $k (sort keys %HoH){
    my $o_f3 = exists $HoH{$k}{'old'}{'F3'} ? $HoH{$k}{'old'}{'F3'}: 'NONE';
    my $n_f3 = exists $HoH{$k}{'new'}{'F3'} ? $HoH{$k}{'new'}{'F3'}: 'NONE';
    my $o_f4 = exists $HoH{$k}{'old'}{'F4'} ? $HoH{$k}{'old'}{'F4'}: 'NONE';
    my $n_f4 = exists $HoH{$k}{'new'}{'F4'} ? $HoH{$k}{'new'}{'F4'}: 'NONE';
    my $o_f5 = exists $HoH{$k}{'old'}{'F5'} ? $HoH{$k}{'old'}{'F5'}: 'NONE';
    my $n_f5 = exists $HoH{$k}{'new'}{'F5'} ? $HoH{$k}{'new'}{'F5'}: 'NONE';
    my $o_f6 = exists $HoH{$k}{'old'}{'F6'} ? $HoH{$k}{'old'}{'F6'}: 'NONE';
    my $n_f6 = exists $HoH{$k}{'new'}{'F6'} ? $HoH{$k}{'new'}{'F6'}: 'NONE';
    printf "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n", $k,$o_f3,$n_f3,$o_f4,$n_f4,$o_f5,$n_f5,$o_f6,$n_f6;
}

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

vivek.vivek 0 Light Poster · Answer 1 · 2012-05-09T08:02:34+00:00

Thank you.

Suppose if I have multiple lines and requirement is to compare a few fields (F2 to F6) of first line alone. what changes i should make.

FILE1

        F1   F2  F3    F4       F5        F6
        FILE Fil1 abc 2011.12 31-oct-2012 35 ENDOFLINE1
            ABCD DEFG HIJK KKLI KLKSJ
            ABCD DEFG 1 HIJK KKLI KLKSJ
            ABCD DEFG HIJK KKLI KLKSJ
        FILE Fil2 abc 2011.12 31-oct-2012 40 ENDOFLINE02
            GHAJ JAHFG YRIRB FJAFL
            GHAJ JAHFG 1 YRIRB FJAFL
            GHAJ J2 AHFG YRIRB FJAFL
        FILE Fil4 abc 2011.12 31-oct-2012 2 EOL
            RUFJF FHFFNF FFFKLF FLFLFLL
            RUFJF FHFFNF FFFKLF FLFLFLL
            RUFJF FHFFNF FFFKLF FLFLFLL

FILE2

         F1   F2  F3    F4       F5       F6
        FILE Fil1 abc 2011.12 31-jan-2013 20 ENDOFLINE
            AB2CD DE45FG HIJ567K KKLI KLKSJ
            ABCD DEFG 1 HIJK KKLI KLKSJ
            ABCD DEFG 4321343 KKLI KLKSJ
        FILE Fil2 abc 2011.12 31-oct-2013 40 ENDOF2
            GHAJ 12JAHFG YRIRB FJAFL
            GHAJ JAHFG 1 YRIRB FJAFL
            GHAJ J2 A2HFG YRIRB FJAFL
        FILE Fil3 abc 2011.12 31-oct-2012 22 EOL
            RUFJF F234FNF FFFKLF FLFLFLL
            RUFJF FHFFNF FFFKLF 74644
            RUFJF F456FNF FFFKLF FLFLFLL

vivek.vivek 0 Light Poster · Answer 2 · 2012-05-09T13:48:35+00:00

Considering that the line begin with "FILE" or "LINE" string always

vivek.vivek 0 Light Poster · Answer 3 · 2012-05-10T05:57:26+00:00

Beginning of the line can be LINE or FILE. So i mean to say a pattern search in both files using ^LINE or ^FILE to compare. So if the order of the lines are different also, using F2 script can be used to compare the lines.

vivek.vivek 0 Light Poster · Answer 4 · 2012-05-11T11:58:29+00:00

Thanks, that made it work with some slight changes (m/^FILE|^LINE/) and combined your latest code with the previous one (attached)

If there is a case if Fil2 exists more than once in both files and output should display that many times?
FILE1

FILE Fil2 abc 2011.12 31-oct-2012 35
    AB2CD DE45FG HIJ567K KKLI KLKSJ
    AB2CD DE45FG HIJ567K KKLI KLKSJ
FILE Fil2 abc 2011.12 31-oct-2012 40
    SDFLSFHKH FJSLJFSJ FJJF
    AFAF AFHAHF ASLFSAJF

File2:

FILE Fil2 abc 2011.12 31-oct-2013 33
    AB2CD12 DE45FG HIJ567K KKLI KLKSJ
    AB2CD DE45FG234 HIJ567K KKLI KLKSJ
FILE Fil2 abc 2011.12 31-oct-2014 20
    SDFLSFHKH34 FJSLJFSJ FJJF
    AFAF AFHAHF AS534LFSAJF
FILE Fil1 abc 2011.12 31-oct-2012 2
    AHFAHFH QWH QFHF
    ANFLAF ASFLAFJAF

OUTPUT:

FEATURE  OLD_VER  NEW_VER  OLD_COUNT NEW_COUNT OLD_DATE     NEW_DATE
Fil2     2011.12  2011.12    35      33       31-oct-2012  31-oct-2013
Fil2     2011.12  2011.12    40      20       31-oct-2012  31-oct-2014
Fil1     None     2011.12    None    33       None         31-oct-2013

current scipt donot show old & new versions (OLD_VER, NEW_VER)

vivek.vivek 0 Light Poster · Answer 5 · 2012-05-11T12:02:02+00:00

unable to attach file. So copying code below.

    use strict;
        @ARGV = qw(FILE1.txt FILE2.txt);
        my %HoH; 
    use warnings;
        @ARGV = qw(FILE1.txt FILE2.txt);
        my %HoH; 
    #Hash of hash references
    while (my $line = <>){
        $line =~ s/^\s+//;
        #Remove space characters and tabs (if any) from start of line
        chomp $line;
        next unless $line =~ m/^LINE|^FILE /;#Skip if doesn't start with FILE or LINE

        my $oldnew;
        if ($ARGV eq 'FILE2.txt'){
            $oldnew = 'old';
            }
            else{
            $oldnew = 'new';
            }

        my @fields = split /\s+/, $line;
        my $key = $fields[1];
        $HoH{$key} = {} if not exists $HoH{$fields[1]};
        $HoH{$key}{'version'} = $fields[3];
        $HoH{$key}{$oldnew}{'date'} = $fields[4];
        $HoH{$key}{$oldnew}{'num'} = $fields[5];
        }
    #printf "%s\t\t\t\t%s\t%s\t%s\t%s\t%s\n", "FEATURE","VERSION","OLD_COUNT","NEW_COUNT","OLD_DATE","NEW_DATE";
    printf "\n%-34s%10s%11s%13s%13s%13s\n","==================================", "=========", "========", "============","========", "============";
    printf "\n%-34s%10s%11s%13s%13s%13s\n", "FEATURE","VERSION","OLD_COUNT","NEW_COUNT","OLD_DATE","NEW_DATE";
    printf "\n%-34s%10s%11s%13s%13s%13s\n","==================================", "=========", "========", "============","========", "============";
    foreach my $k (sort keys %HoH){
        my $ver = $HoH{$k}{'version'};
        my $o_num = exists $HoH{$k}{'old'}{'num'} ? $HoH{$k}{'old'}{'num'}: 'NONE';
        my $n_num = exists $HoH{$k}{'new'}{'num'} ? $HoH{$k}{'new'}{'num'}: 'NONE';
        my $o_dt = exists $HoH{$k}{'old'}{'date'} ? $HoH{$k}{'old'}{'date'}: 'NONE';
        my $n_dt = exists $HoH{$k}{'new'}{'date'} ? $HoH{$k}{'new'}{'date'}: 'NONE';
        #printf "%s\t\t%s\t%s\t%s\t%s\t%s\n", $k,$ver,$o_num,$n_num,$o_dt,$n_dt;
        }

d5e5 109 Master Poster · Answer 6 · 2012-05-12T19:03:30+00:00

Sorry, I don't know the answer. I wrote the script assuming F2 contained a unique key for each record. If F2 can have the same value for more than one record in the same file then it won't serve as a unique key for a hash. To write a script to do what you want, you would need to choose some other data structure that corresponds to assumptions you make about your data.

vivek.vivek 0 Light Poster · Answer 7 · 2012-05-13T02:21:10+00:00

vivek.vivek 0 Light Poster

11 Years Ago

ok :(

d5e5 109 Master Poster · Answer 8 · 2012-05-13T20:48:24+00:00

Since each feature, or F2, can have more than one lines you can still build a hash with features as keys and for the value make a reference to an array. The following example doesn't print exactly the layout you want because that will take more work but you can try running it.

#!/usr/bin/perl
use warnings;
use strict;

my %HoH;

@ARGV = qw(FILE1.txt FILE2.txt);

#Hash of hash references
while ( my $line = <> ) {
    $line =~ s/^\s+//;

    #Remove space characters and tabs (if any) from start of line
    chomp $line;
    next
      unless $line =~ m/^LINE|^FILE /;  #Skip if doesn't start with FILE or LINE
    my $oldnew;
    if ( $ARGV eq 'FILE1.txt' ) {
        $oldnew = 'old';
    }
    else {
        $oldnew = 'new';
    }
    my @fields = split /\s+/, $line;
    my $key = $fields[1];
    $HoH{$key}{$oldnew} = [] if not exists $HoH{$key}{$oldnew};
    push $HoH{$key}{$oldnew}, [@fields];
}

my %unique_lines;
foreach my $k ( sort keys %HoH ) {
    my $ctr = 0;
    foreach my $aref (@{$HoH{$k}{'old'}}){
        my ($o_ver, $o_dt, $o_num) = @$aref[3..5];
        my ($n_ver, $n_dt, $n_num) = @{$HoH{$k}{'new'}[$ctr]}[3..5];
        my @plist = ($k,$o_ver,$n_ver,$o_num,$n_num,$o_dt,$n_dt);
        foreach (@plist){
            $_ = 'None' unless $_;
        }
        undef $unique_lines{join "\t", @plist};
        $ctr++;
    }
    $ctr = 0;
    foreach my $aref (@{$HoH{$k}{'new'}}){
        my ($o_ver, $o_dt, $o_num) = @{$HoH{$k}{'old'}[$ctr]}[3..5];
        my ($n_ver, $n_dt, $n_num) = @$aref[3..5];
        my @plist = ($k,$o_ver,$n_ver,$o_num,$n_num,$o_dt,$n_dt);
        foreach (@plist){
            $_ = 'None' unless $_;
        }
        undef $unique_lines{join "\t", @plist};
        $ctr++;
    }
}

foreach (sort keys %unique_lines){
    printf "%s\t%s\t%s\t%s\t%s\t%s\t%s\n", split;
}

Compare 2 files and output result

Recommended Answers Collapse Answers

All 11 Replies

Recommended Answers