Perl comparing two files and print output

Question

Shirin_2 0 Newbie Poster

8 Years Ago

Hi PerlGurus,

I am really new to perl and liitle handson with the concepts here. I am trying to learn the hash & Array concepts and come up a self exercise to do practice.

Got two files

inFile01 as below:

CLAYCOUNTY;Wood;statecode=FL
CLAYCOUNTY;Wood;statecode=FL
SUWANNEECOUNTY;Wood;statecode=FL
SUWANNEECOUNTY;Wood;statecode=TX
SUWANNEECOUNTY;Wood;statecode=TX
SUWANNEECOUNTY;Wood;statecode=TX
NASSAUCOUNTY;Wood;statecode=UT

infile02 as below:

119736;Residential;CLAYCOUNTY
448094;Residential;CLAYCOUNTY
206893;Residential;CLAYCOUNTY
333743;Residential;CLAYCOUNTY
172534;Residential;CLAYCOUNTY
785275;Residential;CLAYCOUNTY
995932;Residential;CLAYCOUNTY
223488;Residential;CLAYCOUNTY
433512;Residential;CLAYCOUNTY
640802;Residential;SUWANNEECOUNTY
403866;Residential;SUWANNEECOUNTY
828788;Residential;SUWANNEECOUNTY
751490;Residential;SUWANNEECOUNTY
972562;Residential;SUWANNEECOUNTY
367541;Residential;SUWANNEECOUNTY
481360;Residential;SUWANNEECOUNTY
920232;Residential;NASSAUCOUNTY
727659;Residential;NASSAUCOUNTY
471817;Residential;NASSAUCOUNTY
983043;Residential;NASSAUCOUNTY
578286;Residential;NASSAUCOUNTY

Step 1: Parse file inFile01 with search key "Wood" and store the file columns into hash. I want to use 2nd and 3rd column and put it into a hash. with the 2nd column being the keys and 1st column being the distinct values.

Step 2: From above step take the column 1 distict value from inFile01 and find all the values in column 1 of my inFile02 and write them to output file.
In inFile02 I want to use the 3rd and 1st column, with the 3rd column being key and 1st column being the values.

e.g. For key 
CLAYCOUNTY values are 119736, 448094, 206893, 333743, 172534, 785275, 995932, 223488, 433512 
SUWANNEECOUNTY are 640802,403866,828788,751490,972562,367541,481360 
NASSAUCOUNTY are 920232,727659, 471817, 983043, 578286

I have not decided the output file format how to print them but pleaseeeeeeeee help me on this.

Thanks in advance.

perl

Edited 8 Years Ago by Shirin_2

2 Contributors
6 Replies
999 Views
6 Days Discussion Span
Latest Post 8 Years Ago Latest Post by Shirin_2

2teez 43 Posting Whiz

8 Years Ago

Hi Shirin_2,

Just got to see your question you asked on this forum.
You are welcome to the world of Perl where impossible things are made possible and difficult things made easy :)

As regards your question. It actually a straight forward hash parse with no need for either of the file been compared. The reason is, the all the first column in the first file are all present in the second file. Of course, for the practice of what is it we can still compare but to what end? Except we are going to include our findings in the final compare table.

All that been said. There are more than one ways to do this. But one of the better way is to use the module Text::CSV_XS instead of spliting on each of the line of the files.

Since you have two files, you can use either a single hash variable or two hash variables, since the keys must always be unique you can get all you wanted and then compare to print out desired output.

Here is an one way of doing it....

use warnings;
use strict;
use Text::CSV_XS;
use Data::Dumper;
use Inline::Files;

my $hash = {};

my $csv = Text::CSV_XS->new( { binary => 1, sep_char => ';' } );
while ( my $row = $csv->getline(*FILE1) ) {
    push @{ $hash->{ $row->[1] }{ $row->[0] } }, $row->[2];
}

$csv = Text::CSV_XS->new( { binary => 1, sep_char => ';' } );
while ( my $row = $csv->getline(*FILE2) ) {
    push @{ $hash->{ $row->[1] }{ $row->[2] } }, $row->[0];
}

print Dumper($hash);

__FILE1__
CLAYCOUNTY;Wood;statecode=FL
CLAYCOUNTY;Wood;statecode=FL
SUWANNEECOUNTY;Wood;statecode=FL
SUWANNEECOUNTY;Wood;statecode=TX
SUWANNEECOUNTY;Wood;statecode=TX
SUWANNEECOUNTY;Wood;statecode=TX
NASSAUCOUNTY;Wood;statecode=UT
__FILE2__
119736;Residential;CLAYCOUNTY
448094;Residential;CLAYCOUNTY
206893;Residential;CLAYCOUNTY
333743;Residential;CLAYCOUNTY
172534;Residential;CLAYCOUNTY
785275;Residential;CLAYCOUNTY
995932;Residential;CLAYCOUNTY
223488;Residential;CLAYCOUNTY
433512;Residential;CLAYCOUNTY
640802;Residential;SUWANNEECOUNTY
403866;Residential;SUWANNEECOUNTY
828788;Residential;SUWANNEECOUNTY
751490;Residential;SUWANNEECOUNTY
972562;Residential;SUWANNEECOUNTY
367541;Residential;SUWANNEECOUNTY
481360;Residential;SUWANNEECOUNTY
920232;Residential;NASSAUCOUNTY
727659;Residential;NASSAUCOUNTY
471817;Residential;NASSAUCOUNTY
983043;Residential;NASSAUCOUNTY
578286;Residential;NASSAUCOUNTY

and my raw output will be

$VAR1 = {
          'Wood' => {
                      'NASSAUCOUNTY' => [
                                          'statecode=UT'
                                        ],
                      'SUWANNEECOUNTY' => [
                                            'statecode=FL',
                                            'statecode=TX',
                                            'statecode=TX',
                                            'statecode=TX'
                                          ],
                      'CLAYCOUNTY' => [
                                        'statecode=FL',
                                        'statecode=FL'
                                      ]
                    },
          'Residential' => {
                             'CLAYCOUNTY' => [
                                               '119736',
                                               '448094',
                                               '206893',
                                               '333743',
                                               '172534',
                                               '785275',
                                               '995932',
                                               '223488',
                                               '433512'
                                             ],
                             'SUWANNEECOUNTY' => [
                                                   '640802',
                                                   '403866',
                                                   '828788',
                                                   '751490',
                                                   '972562',
                                                   '367541',
                                                   '481360'
                                                 ],
                             'NASSAUCOUNTY' => [
                                                 '920232',
                                                 '727659',
                                                 '471817',
                                                 '983043',
                                                 '578286'
                                               ]
                           }
        };

From the output, you can see Wood => countryName => {....}, then you can also see Residential => countryName => {......}. Which can then be worked with and desired output gotten.

You might have to open and read these files, I used module Inline::Files to achieve that purpose in this program. The comparsion and the output of a desired output is left for the OP as practice.

Hope this helps.

2teez 43 Posting Whiz

8 Years Ago

Hi Shirin_2,

Can we achieve this without using this modules, I understand will not be straightforward compare to using the stated modules by you. Please.

Yes You can solve this using plain perl using only core modules. In fact, in Perl it is a common staying that there is more than one way to do it , which is given as an acronym TIMTODAY

For the sake of this I will show one...

We can write a function which can parse our files for us, since the parseing of the file follows the same pattern. We would have to employ split and use a bit clever Data Structure to do our work... like thus:

use warnings;
use strict;
use Data::Dumper;
use autodie qw/open close/;

my $hash = {};

my $parser = parse_file( { filename => 'file1.txt', sep_char => ';' } );

for ( 0 .. $#$parser ) {
    push @{ $hash->{ $parser->[$_][1] }{ $parser->[$_][0] } } =>
      $parser->[$_][2];
}

$parser = parse_file( { filename => 'file2.txt', sep_char => ';' } );

for ( 0 .. $#$parser ) {
    push @{ $hash->{ $parser->[$_][1] }{ $parser->[$_][2] } } =>
      $parser->[$_][0];
}

print Dumper($hash);

sub parse_file {
    my $data     = shift;
    my $new_data = [];
    open my $fh, '<', $data->{filename};
    while (<$fh>) {
        chomp;

        push @$new_data, [ split /$data->{sep_char}/ ];
    }
    return $new_data;

}

Our result will be like the previous one where we have used Text::CSV_XS and Inline::Files

The function gets an hash reference and returns an ARRAY OF ARRAY, then we use a for loop to get our values and parsed like we did using Text::CSV_XS.

Hope this helps.....

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Shirin_2 0 Newbie Poster · Answer 1 · 2017-02-09T07:33:35+00:00

Hi 2teez
Thanks for looking at my question and introducing me to CSV_XS and Inline modules. However I don't have installed in my setup.
Can we achieve this without using this modules, I understand will not be straightforward compare to using the stated modules by you. Please.

Shirin_2 0 Newbie Poster · Answer 2 · 2017-02-10T14:19:13+00:00

Hi 2teez,
Thanks for your patience and time to look again. I tried your code and works too... Please look into below piece of code which I have done.

#/usr/bin/perl -w

use warnings;
use strict;
use Data::Dumper;

my (@county,%result);

my $inFile01 ="FILE01.dat";

             #CLAYCOUNTY;Wood;statecode=FL
             #CLAYCOUNTY;Wood;statecode=FL
             #SUWANNEECOUNTY;Wood;statecode=FL
             #SUWANNEECOUNTY;Wood;statecode=TX
             #SUWANNEECOUNTY;Wood;statecode=TX
             #SUWANNEECOUNTY;Wood;statecode=TX
             #NASSAUCOUNTY;Wood;statecode=UT

open(DATA01,'<',$inFile01)or die("Can't open input file\"$inFile01\":$!\n");
while (<DATA01>) {
    # Skipping if the line is empty or a comment
    next if ( $line =~ /^\s*$/ );
    next if ( $line =~ /^#\s*/ );

    my ($county,$srch,$stcode) = split(";",$_);
    chomp($srch,$county);

    if ($srch eq "Wood") { push (@county,$county) }
}
close(DATA01);

my $inFile02 ="FILE02.dat";

#119736;Residential;CLAYCOUNTY
#448094;Residential;CLAYCOUNTY
#206893;Residential;CLAYCOUNTY
#333743;Residential;CLAYCOUNTY
#172534;Residential;CLAYCOUNTY
#785275;Residential;CLAYCOUNTY
#995932;Residential;CLAYCOUNTY
#223488;Residential;CLAYCOUNTY
#433512;Residential;CLAYCOUNTY
#640802;Residential;SUWANNEECOUNTY
#403866;Residential;SUWANNEECOUNTY
#828788;Residential;SUWANNEECOUNTY
#751490;Residential;SUWANNEECOUNTY
#972562;Residential;SUWANNEECOUNTY
#367541;Residential;SUWANNEECOUNTY
#481360;Residential;SUWANNEECOUNTY
#920232;Residential;NASSAUCOUNTY
#727659;Residential;NASSAUCOUNTY
#471817;Residential;NASSAUCOUNTY
#983043;Residential;NASSAUCOUNTY
#578286;Residential;NASSAUCOUNTY

foreach my $cnty (@county) {
        my @countycode;
        open(DATA02,'<',$inFile02)or die("Can't open input file\"$inFile02\":$!\n");

        while (<DATA02>) {
            # Skipping if the line is empty or a comment
            next if ( $line =~ /^\s*$/ );
            next if ( $line =~ /^#\s*/ );

            my ($code,$attr,$countyy) = split (";",$_);
            chomp ($code,$attr,$countyy);

            if ($countyy eq $cnty) { push @countycode, $code; }

            $result{$cnty} = [@countycode]
        }
}
close(DATA02);

print Dumper \%result;

foreach my $key (keys %result) {
        print "$key" . "\n";
        my $op = join "|", @{$result{$var}};
        print "$op" . "\n";
}

#output for my foreach loop is as below
NASSAUCOUNTY
920232|727659|471817|983043|578286
CLAYCOUNTY
119736|448094|206893|333743|172534|785275|995932|223488|433512
SUWANNEECOUNTY
640802|403866|828788|751490|972562|367541|481360

I am able to get thw desired values. One last thing with printing the output to an another file now which is required as below:
for all the values in @{$result{$var}} I need to print as follows to a output file (No particular order) - for 119736 Need two lines in file as below (similarly for all).
L|A|119736|119736|||||||||||||||||||||||
M|A|119736||||Wood|Wood|CONSTANT_STRING

complete file looks like

L|A|119736|119736|||||||||||||||||||||||
M|A|119736||||Wood|Wood|CONSTANT_STRING
L|A|448094|119736|||||||||||||||||||||||
M|A|448094||||Wood|Wood|CONSTANT_STRING
L|A|206893|206893|||||||||||||||||||||||
M|A|206893||||Wood|Wood|CONSTANT_STRING
L|A|333743|333743|||||||||||||||||||||||
M|A|333743||||Wood|Wood|CONSTANT_STRING
L|A|172534|172534|||||||||||||||||||||||
M|A|172534||||Wood|Wood|CONSTANT_STRING

.....
....
....

Thanks.

2teez 43 Posting Whiz · Answer 3 · 2017-02-13T21:00:39+00:00

Hi,

Please look into below piece of code which I have done.

Am sorry it took me this long to attend to this. I had some other stuff on my plate. :).

I have read through your code from day since you posted. But I couldn't reply due to other things like i mentioned previously.

My Observation:

there is no need for -w in your code since you are using use warnings; There are the same thing. But there scope is different. -w is global, while the other has a file or a block scope. Of course, use either as the case maybe. But use warnings; is favoured in most cases.
Since you had the file to read in a variable, while are you still putting them in comment tags. Moreover, if you chose, to do so, why not use HEREDOC in perl? Instead of a line comments?
Always and always use 3 - arugments open function with scope variable. like so open my $DATA1, '<', $filename or die "can't open $filename";
From Line 56 of your last post. Instead of putting the open function inside of the for loop, which makes you do a very expensive operation of open the file for EVERY LINE of your file.. Why not put the for loop after the open function, such that the file is openned once and for everyline you read using the while loop, you make yor for loop check your array to see if it matches. By so doing, you lift the burden from your program of openning the file every time for every line!

Am sure your output is what you wanted? If not you can still ask questions Ooooo..
Lastly, nice work bro keep keeping on.. You are on your way to becoming a guru already! Cheers.

Shirin_2 0 Newbie Poster · Answer 4 · 2017-02-14T04:50:57+00:00

Thanks 2teez for all the generous comments.

why not use HEREDOC in perl? Definately a must learn and to implement as you suggested.

I am now trying to implement the text file format to xml. But need to get a perl booster again with this concept :-)