Dear All,

I want to know whether there is any perl script availabe to compare more than two files and print out the matching rows,
for example if i have following files,
file_1
TPT 0.0520852296
RP11 0.1062639955
AC01 1.4112745088
AC00 0.4992644913

file_2
LINC 0.1648703511
AC00 0.1632039268
CTD 4.3654577641
RP1 0.1357422856
AC01 1.456789
AC00 0.56789

file_3
RP1 0.0034001871
Z8385 0.0183523803
LINC 0.0099523132
AC01 1.4112745088
AC00 0.356788899

.......
like this have many files but all files with just two columns
and I want my out put as,
AC01 1.4112745088 1.4112745088 1.456789
AC00 0.4992644913 0.356788899 0.56789

where the matching rows in three of thes files should be printed as output.

Thank you all

It is quite simpler than what i first proposed.

I didn't know that you are not comparing these file with a prime file, but within themselves. So, simply, write a subroutine that read these files, split each line, then get a key and the corresponding value. And if a files has the same key, make the value an array data structure.

Check to see that the total number of file on the CLI is more than 1.

Something in this line:

use warnings;
use strict;

use Getopt::Long;
use Data::Dumper;

sub read_file(\$);    # prototype

die "Give more than one file on the CLI" unless scalar @ARGV > 1;

my $data = {};        #hash ref

read_file $_ for @ARGV;

print Dumper $data;

sub read_file(\$) {
    my $filename = ${ shift(@_) };

    open my $fh, '<', $filename
      or die "can't open file: $!";

    while (<$fh>) {
        my ( $key, $value ) = split /\s+/, $_;
        if ( exists $data->{$key} ) {
            if ( !ref $data->{$key} ) {
                $data->{$key} = [$value];
            }
            push @{ $data->{$key} } => $value;
        }
        else {
            $data->{$key} = $value;
        }
    }
}

Here, I used Data::Dumper to show the value of each data. The OP can show how to display the values and keys as an execrice.

Result shown by the code above:

$VAR1 = {
          'RP1' => [
                     '0.0034001871',
                     '0.0034001871'
                   ],
          'CTD' => '4.3654577641',
          'Z8385' => '0.0183523803',
          'TPT' => '0.0520852296',
          'AC00' => [
                      '0.1632039268',
                      '0.1632039268',
                      '0.56789',
                      '0.356788899'
                    ],
          'LINC' => [
                      '0.0099523132',
                      '0.0099523132'
                    ],
          'RP11' => '0.1062639955',
          'AC01' => [
                      '1.456789',
                      '1.456789',
                      '1.4112745088'
                    ]
        };

Hope this helps

Hello Again,

I just added a print line to see what is happening inside the while loop and to see the output, to the above code with three input files in perl commd line.

`while (<$fh>) {
        my ( $key, $value ) = split /\s+/, $_;
        if ( exists $data->{$key} ) {
            if ( !ref $data->{$key} ) {
                $data->{$key} = [$value];
            }
            push @{ $data->{$key} } => $value;
        print $value,"\n";
        }
        else {
            $data->{$key} = $value;
        #print $value,"\n";
        }
    }
}

`
But the output is just a single column, but I need all the corresponding information for all matching rows..
Now the above script gives me following output,

AC01
AC00

That is the matching first rows from first column, but I need all the information from matching rows I mean like this,

AC01 1.4112745088 1.4112745088 1.456789
AC00 0.4992644913 0.356788899 0.56789

Also could you please brief a little bit more about the else loop and what is it doing when I did a print on it givesme same output as it did for if loop ..

Thank you so much ..!! And very sorry for really basic questions

Edited 1 Year Ago by Anna123

Hi Anna123,

There is no need being sorry :)..
You can be printing from within the loop, because we are simply "arranging" how our input to get the desired output.

The reason Data::Dumper was used to print out the final desired output in line 15. So, instead of just that line 15, we have to print out our hash data, one at a time using for loops or map function as one likes it.

The line of action is arrange first, then display later.

Lastly, I have added commets to the while loop, so that you can actually see what is going on within it. Am sorry, I didn't do that at first.

Here is the while loop again!

    while (<$fh>) {

        # get the key and the value
        my ( $key, $value ) = split /\s+/, $_;

        # check to see if the key exists in
        # the hash variable
        if ( exists $data->{$key} ) {

            # if the key exists then
            # check it value is not a ref type
            # if NO, then make it so
            # the reason for [$value]
            if ( !ref $data->{$key} ) {
                $data->{$key} = [$value];
            }

            # if the value is a ref type
            # simply push in the next value
            push @{ $data->{$key} } => $value;
        }
        else {
            # if the key doesn't exists before
            # in the hash then create it using
            # autovification in perl
            $data->{$key} = $value;
        }
    }

Edited 1 Year Ago by 2teez

This question has already been answered. Start a new discussion instead.