Dear All,

I have two files where I need to map file_1 to file_2 based on the similarity of the 1st column from file_1 to 8th column to file_2 and print the similar rows with annotaions for that. I am able to achieve that with the following perl script.
But the problem is my perl script is not able to differntiate the dot(.) in the word which I used to map the columns.

my file_1 and file_2 looks like following,

file_1:

CUFF.2  chr1:14362-29806    24.2763 22.1124 26.4401 OK
CUFF.23 chr1:89294-173862   4.95251 3.44948 6.45555 OK

and

file_2:

chr1    Cufflinks   transcript  11869   14409   1   +   .   CUFF.2   transcript_id ENST00000456328.2     FPKM 0.0000000000   frac 0.000000   conf_lo 0.000000    conf_hi 
0.059559     cov 0.000000    full_read_support no   
chr1    Cufflinks   exon    11869   12227   1   +   .   CUFF.23  transcript_id ENST00000456328.2     exon_number 1   FPKM 0.0000000000   frac 0.000000   conf_lo 0.000000    
conf_hi 0.059559     cov 0.000000

the perl code I used,
`

  #open the second file, then step through it and find
    #match values to display
    my %gen_id;
    # open the first file, split and save in an hash
    open my $fh, '<', $ARGV[0] or die "can't open file: $!";
    while (<$fh>) {
    my ( $id, $value ) = split;
    $gen_id{$id} = $value;

    }
   close $fh or die "can't close file: $!";
  #open the second file, then step through it and find
    #match values to display
    my %get_data;
    open $fh, '<', $ARGV[1] or die "can't open file: $!";
    while (<$fh>) {
    #get only values in indexes 11,13 & 14,
    #starting from 0
    foreach my $file ( [split] ) {
        if ( exists $gen_id{ $file->[8] } ) {

            $get_data{ $file->[0]}=join "\t"=> @$file[1,2,3,4,5,6,7,8,9];
        }
        }
    }
        print $_,"\t",$get_data{$_},"\t", $/ for sort keys %get_data;
    close $fh or die "can't close file: $!";
    #print $_,"\t",$get_data{$_}, $/ for sort keys %get_data;Inline Code Example Here

`
This codes gives me output which looks like this,

chr1    Cufflinks   exon    11869   12227   1   +   .   CUFF.23  transcript_id ENST00000456328.2     exon_number 1   FPKM 0.0000000000   frac 0.000000

wherein I am suppose to get the this line as,

chr1    Cufflinks   exon    11869   12227   1   +   .   CUFF.23  transcript_id ENST00000456328.2     exon_number 1   FPKM 0.0000000000   frac 0.000000
chr1    Cufflinks   transcript  11869   14409   1   +   .   CUFF.2   transcript_id ENST00000456328.2     FPKM 0.0000000000   frac 0.000000   conf_lo 0.000000

That is the perl script is considering both CUFF.2 and CUFF.23 as same asnd its removing the duplicated line.The reality is its not duplicate its two different names, it would be really great if someone would help me to alter the code a little bit so that I will get the output I want here.

Thank you

Hello,

Why don't you check what you are getting as your keys and values of the hash.

You can print out your values to see if you are getting the expected string.

You might also look at Data::Dumper to print out your hash so that you can see what they contain.

Okay maybe you didn't get what am saying clearly. Using your code, you are overwriting, the data in your hash variable %get_data as you are doing here:

$get_data{ $file->[0]}=join "\t"=> @$file[1,2,3,4,5,6,7,8,9];

Don't forget that the first string in the file2 is chr1 which is the same value for all the keys in the hash %gen_id. So, with assignment =, you are overwriting your values. So, to get what you wanted, you could use this:

push @{ $get_data{ $file->[0] } } => join "\t" => @$file[ 1, 2, 3, 4, 5, 6, 7, 8, 9 ];

instead.

There are several other improvement you could do to your script though. One of which is DRY! Which is Don't Repeat Yourself. Since, you are open a file more than once, put it in a subroutrine and use a code to get at what you wanted for different files.

Secondly, check the number of files, you are putting into your script and let the usage know if it is done wrongly.

Lastly, with range of string you want displayed, your expected output will not be want you showed in your OP. More so, you could just print out what you wanted instead of using a second hash varaible.

Hope this helps.

This question has already been answered. Start a new discussion instead.