row comparision in perl

Question

genetist 0 Newbie Poster

11 Years Ago

Hi to all PERL programmers,
I have data like this with 6 columns

LINES   XY1 XY2 XY3 XY4 XY5

P1  Z/Z T/T -/- T/T T/T
P2  A/A A/A G/G Z/Z T/T
1   G/G T/T G/G T/T G/G
2   T/T A/A C/C C/C T/T
3   T/T G/G T/T G/G T/T
4   A/A C/C A/A A/A A/A
5   A/A A/A T/T T/T A/A

First I want to find how many columns (from XY1 to XY5) are different for P1 and P2 ,
Eq means:
Both P1 and P2 should contain same same letters (alleles) or if any one of P1 or P2 contains Z/Z or -/- I should consider them as eq only.
```
        2.  I will compare lines column values from 1 with P2 for all columns (from XY1 to XY5) in horizontal way and continue for remaining lines from 2 to 5. if they match I would like to give 1 else 0 
```
1. I will make sum for lines 1 to 5 across all the columns from columns XY1 to XY5 but I will include only columns showing different for P1 and P2 in my sum count.
2. I will calculate percentage of matching lines 1 to 5 with P2 by dividing sum with number of different markers between P1 and P2.
  I am expecting like this

I am expecting like this

LINES   XY1 XY2 XY3 XY4 XY5

P1 eq nq eq eq eq SUM %
P2 1
1 0 0 1 0 0 0 0
2 0 1 0 0 1 1 100
3 0 0 0 0 1 0 0
4 1 0 0 0 0 0 0
5 1 1 0 0 0 1 100

Like this I have data in more than 5000 rows and at present I am doing in excel 2010 with different formulas but it is taking lot of my energy.
I would like to do this PERL and I am newbie in PERL, I am succeeded in file reading onto screen.
I really need help in solving this in PERL with code. Any help would be appreciated

perl

2 Contributors
1 Reply
161 Views
2 Days Discussion Span
Latest Post 11 Years Ago Latest Post by 2teez

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

2teez 43 Posting Whiz · Answer 1 · 2013-09-22T23:46:26+00:00

Hi genetist,
There are several things in the your explination of the what you want that is not clear. However, following the title of your post and with a little understanding of what you wanted I came up with the following that I believe will help you a great deal.

use warnings;
use strict;

my %data;

=pod
 Since I don't understand what you
 want with the P1 and P2 comparism
 I omitted comparing these
 if what you wanted is clear enough
 then we can factor that in later.
=cut

# get the header first
my $header = <DATA>;

# then take the P1 off
# since I don't understand how
# you want to use it
<DATA>;

while (<DATA>) {
    my @val = split /\s+/, $_;
    push @{ $data{ $val[0] } }, @val[ 1 .. $#val ];
}

# get the values for P2;

my @p2 = @{ delete $data{P2} };

=pod
The following display the heading and the 
row comparism of P2 with other rows as
specify by the Original Poster except for SUM and %
I don't know how the OP supposed to
generate his SUM and percentage ( % )
So, until that is known. It is omitted from the following
=cut

print $header;

for my $key ( sort keys %data ) {
    print $key, ' ';
    my @values = @{ $data{$key} };
    for ( 0 .. $#values ) {
        print $values[$_] eq $p2[$_] ? '1 ' : '0 ';
    }
    print $/;
}

__DATA__
LINES   XY1 XY2 XY3 XY4 XY5
P1  Z/Z T/T -/- T/T T/T
P2  A/A A/A G/G Z/Z T/T
1   G/G T/T G/G T/T G/G
2   T/T A/A C/C C/C T/T
3   T/T G/G T/T G/G T/T
4   A/A C/C A/A A/A A/A
5   A/A A/A T/T T/T A/A

produces .....

LINES   XY1 XY2 XY3 XY4 XY5
1 0 0 1 0 0 
2 0 1 0 0 1 
3 0 0 0 0 1 
4 1 0 0 0 0 
5 1 1 0 0 0

Lastly, the language is Perl not PERL. Perl is not an acroymn though some had been formed for it.