0

Dear Experts,
Thanks for your time. I have got stuck on following problem. I have one file with five column name Chr, Pos Qial GT1 and GT2. In file Column 4 Gt1 is with 01 value only. I like to do comparision between col GT1 and col GT2. I like It has to check col 4 and 5 and print in output If 01 is followed by 00 as starting point and in another situation 00 followed by 01 as End point. For example In following sample, It has to print row (2) as starting point because it is followed by 00 in Col GT2. and row (4) as end point. Another cases would mentioned in expected outcome details. In few cases like row (10) and (13) where 01 situation is pre and followed by 00. So it first write this position as end point and same position as start point in next row.

Chr     POS   QUAL    GT1      GT2
1       2556      96      01    01
1       1685      125     01    01
1       1770      80      01    00
1       1785      90      01    01
1        1810     95      01    01
1        1825     77      01    00
1        1835     80      01    00
1        1845     120     01    00
1        1875     125     01    00
1        1888      80     01    01
1        1910      95     01    00
1        1914     110     01    00
1        1935      65     01    01
1        1985      78     01    00
1        2030     100     01    01
1        2050      90      01    01

Expected Output

Start      End     
1685  1785
1810    1888
1888    1935
1935    2030
2
Contributors
1
Reply
19
Views
4 Years
Discussion Span
Last Post by 2teez
0

Hello yksrmc,

This can easily be done by going through your data a line at a time. Since it's only the last number that is changing there is no need comparing it with the pervious figure before it.
Just take the number you wanted and check the condition you specify, if that is met, raise a flag, but if not take off the flag; like a flip-flop switch.

The code below solve the problem as you wanted it.

use warnings;
use strict;

use constant {
    START_IT => '01',
    END_IT   => '00',
};

<DATA>;    # read out the heading if you want

my $flag = 0;
my $avant_pt;

print sprintf "%s\t%s\n", "START", "END";

while (<DATA>) {
    my ( $pos, $gt2 ) = (split)[ 1, 4 ];
    if ( $gt2 == END_IT && ++$flag == 1 ) {
        print $avant_pt, "\t";
    }
    elsif ( $gt2 eq START_IT && $flag != 0 ) {
        print $pos, $/;
        $flag = 0;
    }
    $avant_pt = $pos;
}

__DATA__
Chr     POS   QUAL    GT1      GT2
1       2556      96      01    01
1       1685      125     01    01
1       1770      80      01    00
1       1785      90      01    01
1        1810     95      01    01
1        1825     77      01    00
1        1835     80      01    00
1        1845     120     01    00
1        1875     125     01    00
1        1888      80     01    01
1        1910      95     01    00
1        1914     110     01    00
1        1935      65     01    01
1        1985      78     01    00
1        2030     100     01    01
1        2050      90      01    01

Which gives the following output

START   END
1685    1785
1810    1888
1888    1935
1935    2030

I can only hope that this helps you.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.