Hi Perl Experts,

I'm new for perl. Thanks to give some time for my question.I have extract shared positions between two files and now I have one file like following. File is tab delimited

Chr1 position  QUAL GT_1 GT_2

1. chr1 478493    595   G/T  G/C
2. chr1 879243    700   A/T  A/A
3. chr2 889922    1300  C/C  C/C
4. chr2 1926372   300   T/A  T/A 
5. chr3  237474   500   G/C  C/C
6. chr3   575757  700   A/T  A/A
7. chr3 6666874   746   T/T  T/T

and so on

I like to extract list in new file when column 4 not match with coloumn 5 then it has to print complete row in new file. Another thing I want It also to calculate how many A/T converted to AA

Expected output
chr position QUAL GT_1 GT_1
1. chr1 478493    595   G/T  G/C
2. chr1 879243    700   A/T  A/A
3. chr3  237474   500   G/C  C/C
4. chr3   575757  700   A/T  A/A

Total number A/T conveted to A/A =2
Total number G/C converted to C/C =1

Recommended Answers

All 12 Replies

Are you looing for something like this?

#!/usr/bin/perl

use warnings;
use strict;
use autodie qw/open close/;

print "Enter input filename->";
chomp(my $ifilename = <STDIN>);

open(my $IFILE, "<", $ifilename);

while ( <$IFILE> )
{
    chomp;
    my @data = split(/\t+/, $_);
    print "@data\n" unless ( $data[3] eq $data[4] )#print this to another file
}

close($IFILE);

__END__

Hi,

You can do this:

#!/usr/bin/perl
use warnings;
use strict;

my %count_variable_converted;

chomp( my $title = <DATA> );

print $title,$/;
while (<DATA>) {
    next if /^$/;
    s/\s$//g;
    my @data = split /\s+?/, $_, 5;

    $count_variable_converted{'A/T'}{'A/A'}++
      if ( $data[3] eq 'A/T' and $data[4] eq 'A/A' );   

    $count_variable_converted{'G/C'}{'C/C'}++
      if ( $data[3] eq 'G/C' and $data[4] eq 'C/C' );

    if ( $data[3] ne $data[4] ) {
        print +( join "\t" => @data ), $/;
    }
}

for my $gt_1 ( keys %count_variable_converted ) {    
    for my $gt_2 ( keys %{ $count_variable_converted{$gt_1} } ) {    
        print sprintf "Total Number %s converted to %s is %d\n" => $gt_1,
          $gt_2, $count_variable_converted{$gt_1}->{$gt_2}, $/;
    }    
}

__DATA__
Chr1 position  QUAL    GT_1     GT_2

chr1    478493  595 G/T G/C
chr1    879243  700 A/T A/A
chr2    889922  1300    C/C C/C
chr2    1926372 300 T/A T/A 
chr3    237474  500 G/C C/C
chr3    575757  700 A/T A/A
chr3    6666874 746 T/T T/T

OUTPUT

Chr1 position  QUAL    GT_1     GT_2
chr1    478493  595    G/T      G/C
chr1    879243  700    A/T      A/A
chr3    237474  500    G/C      C/C
chr3    575757  700    A/T      A/A
Total Number A/T converted to A/A is 2
Total Number G/C converted to C/C is 1

Of course, you would need the function open if you are reading from a file. Hope this helps.

Thanks @2teez and Gerand for your time for reply with code suggestion . I tried these codes. but it's printing all lines . I tried the follwoing code using file handlind open function. 

#!/usr/bin/perl
use warnings;
use strict;
my %count_variable_converted;
my $INFILE='./Sample_file.txt';
open (INPUT,"<$INFILE");
while (<INPUT>) {
 next if /^$/;
    s/\s$//g;
    my @data = split /\s+?/, $_, 5;
   $count_variable_converted{'A/T'}{'A/A'}++
    if ( $data[3] eq 'A/T' and $data[4] eq 'A/A' );

    $count_variable_converted{'G/C'}{'C/C'}++
      if ( $data[3] eq 'G/C' and $data[4] eq 'C/C' );
    if ( $data[3] ne $data[4] ) {

        print +( join "\t" => @data ), $/;
    }
}

for my $gt_1 ( keys %count_variable_converted ) {

   for my $gt_2 ( keys %{ $count_variable_converted{$gt_1} } ) {

        print sprintf "Total Number %s converted to %s is %d\n" => $gt_1,
          $gt_2, $count_variable_converted{$gt_1}->{$gt_2}, $/;
}
}

close INPUT;

The output of this code looks like this . When data[3] and data[4] equal. It shows two errors.It also did not count.

Use of uninitialized value in string ne at ./Count.pl line 16, <INPUT> line 1.
CHR     POS     GT_1    GT_2
Use of uninitialized value in string ne at ./Count.pl line 16, <INPUT> line 2.
chr1    12354   A/T     A/A
Use of uninitialized value in string ne at ./Count.pl line 16, <INPUT> line 3.
chr1    43554   G/C     C/C
Use of uninitialized value $data[4] in string eq at ./Count.pl line 14, <INPUT> line 4.
Use of uninitialized value in string ne at ./Count.pl line 16, <INPUT> line 4.
chr1    76767   G/C     G/C
Use of uninitialized value in string ne at ./Count.pl line 16, <INPUT> line 5.
chr2    42525   A/T     A/A
Use of uninitialized value in string ne at ./Count.pl line 16, <INPUT> line 5.
chr2    76447   G/C     C/C
Use of uninitialized value $data[4] in string eq at ./Mutation_Count.pl line 14, <INPUT> line 6.
Use of uninitialized value in string ne at ./Count.pl line 16, <INPUT> line 6.
chr2    84332   G/C     G/C
Use of uninitialized value in string ne at ./Count.pl line 16, <INPUT> line 7.
chr3    34364   A/T     A/A

Hi,
I think the issue is with your file simple_file.txt. The code I gave should work perfectly.
However, I have attached to this mail, a sample of the file I used. Again see a modification of the code above.

#!/usr/bin/perl
use warnings;
use strict;

my %count_variable_converted;

my $filename = 'simple_file.txt';

open my $fh, '<', $filename or die "can't open file: $!";

chomp( my $title = <$fh> );

print $title, $/;
while (<$fh>) {
    next if /^$/;

    my @data = ( $1, $2, $3, $4, $5 )
      if m/^     # start from the beginning of the each line
          (.+?)  # $1
           \s+?  # space or tab
           (.+?) # $2
           \s+?  # space or tab
           (.+?) # $3
           \s+?  # space or tab
           (.+?) # $4
           \s+?  # space or tab
           (.+?) # $5
           $     # end of the line
         /x;

    $count_variable_converted{'A/T'}{'A/A'}++
      if ( $data[3] eq 'A/T' and $data[4] eq 'A/A' );

    $count_variable_converted{'G/C'}{'C/C'}++
      if ( $data[3] eq 'G/C' and $data[4] eq 'C/C' );

    if ( $data[3] ne $data[4] ) {
        print +( join "\t" => @data ), $/;
    }
}

close $fh or die "can't close file: $!";

for my $gt_1 ( keys %count_variable_converted ) {
    for my $gt_2 ( keys %{ $count_variable_converted{$gt_1} } ) {
        print sprintf "Total Number %s converted to %s is %d\n" => $gt_1,
          $gt_2, $count_variable_converted{$gt_1}->{$gt_2}, $/;
    }
}

Using regexp to pick out the values needed for each line. Please do let us know if this works for you.

I went throught your code and there are a number of old or wrong ways of doing perl.

  1. Always check the return of the open function using perl , also for close function. Or you use autodie has shown by gerard4143.
  2. Use 3 arugment option of open function
  3. use lexically scope file-handles in open function.
Thanks 2teez.!! Yes you are right, your code is working perfect. However It has one some bugs which I noticed. It prints those lines too which satisfy just one condition only either G/C or C/C. It's not printing only those lines which satisfy both conditions. It's checking one condition. it print reverse combition line too like Gt_1 CC and Gt_2  G/C

Like It prints those lines too which has just either  for example our condition is 'G/C' and 'C/C'. So it should print print those lines which has cloumn GT1 as G/C and Column GT_2 as C/C

It printing following combinations too.

GT_1   GT_2
 C/C    G/C
 T/C    C/C
 C/C    T/C
 C/C    C/C

 Thanks Counting work doing perfect. It's counting only those lines which satisfy both conditions. I just like to know why on printing it's printing all lines which satisfy just one condition. I hope you got my point. BTW thanks for all your reply. Before to close this question. It 'll be great pleasure if this bug of checking one condition for printing could sort out too.

Thanks I have got the reason why for printing all lines are printing, because for printing we used condition data[3] ne data[4]. So it would print all lines which are mismatch with any combination. .I'll mark as question solved in couple of days Thanks 2teez and Gerard for your time and prompt reply.

Hi,

However It has one some bugs which I noticed.

No, it hasn't. The script I wrote address the original condition you gave:

I like to extract list in new file when column 4 not match with coloumn 5

So, as long as column 4 is NOT equal to column 5, that whole line is printed and since 'C/C' is not equal 'G/C' the line is also printed.

Thanks 2teez, That's why at last I mentioned I got the reason why those lines are also printing. I got your point. I have marked question as solved.

Hi yksrmc,

Majorly I think it all has to do with your input data. Below is another solution, using map function and hash value. Of course it's fewer lines compare to the ones before. It also give a correct output considering the new condition you gave.

#!/usr/bin/perl
use warnings;
use strict;
use constant TITLE => "Chr1 position  QUAL    GT_1     GT_2
";

<DATA> for 1 .. 2;    ## ignore the first two lines

my $counter = {};

print TITLE;
print join "\t" => map {
    $_->[4] =~ s/^\s+|\s+$//g;
    $counter->{'A/T'}{'A/A'}++ if $_->[3] eq 'A/T' and $_->[4] eq 'A/A';
    $counter->{'G/C'}{'C/C'}++ if $_->[3] eq 'G/C' and $_->[4] eq 'C/C';

    @$_, $/ if $_->[3] ne $_->[4] and $_->[3] ne ( 'C/C' or 'A/A' );
}[ ( split /\s+/, $_, 5 )[ 0 .. 4 ] ] while <DATA>;

print map {
    sprintf "The Total Number of %s converted to %s is %d\n",
      $_, %{ $counter->{$_} }
} keys %$counter;

__DATA__
Chr1 position  QUAL    GT_1     GT_2

chr1    478493  595 G/T G/C
chr1    879243  700 A/T A/A
chr2    889922  1300    C/C C/C
chr2    1926372 300 T/A T/A 
chr3    237474  500 G/C C/C
chr3    575757  700 A/T A/A
chr3    6666874 746 T/T T/T
chr3    6666874 746 C/C G/C

Thanks Gerard and 2teez for your time, but I tried all suggested code but . It prints all the lines which are like coloumn3 AA and column 4 AA. When I try it on real data, It's printing everything condition match or not. Could you help to sort out. I have attached data file, I don't know the reason when we have given condition then why It's printing everything. Could you try why what's wrong with code for following data.

Thanks Gerard and 2teez for your time, but I tried all suggested code but . It prints all the lines which are like coloumn3 AA and column 4 AA. When I try it on real data, It's printing everything condition match or not. Could you help to sort out. I have attached data file, I don't know the reason when we have given condition then why It's printing everything. Could you try why what's wrong with code for following data.

I have attached small real sample file. In this sample for example condition can be A/G to GG because other cases may be very rare. First column is serial number column

Could we apply for each or for loop for this problem to count condition

Thanks for your time for my query

Hi yksrmc,

The perl script given before now does the job, only that you have to change just some stuff to adapt the script to what you needed.

However, I did modify the script to work with the new file attached to your last mail.
Please note you would have to change the condition as the case may be. I only added the new condition you gave. Please also see my output result attached to this mail as well.

Below is the script used:

#!/usr/bin/perl
use warnings;
use strict;
use Carp qw(croak);

croak "Usage: perl_script.pl <file_to_check.txt>"
  unless defined $ARGV[0];

my $filename = $ARGV[0];
my $counter  = {};

open my $fh, '<', $filename or croak "can't open $filename: $!";

while (<$fh>) {
    my ( $number1, $chr_name, $number2, $value, $gt_1, $gt_2 ) = split;

    $counter->{'A/T'}{'A/A'}++ if $gt_1 eq 'A/T' and $gt_2 eq 'A/A';
    $counter->{'G/C'}{'C/C'}++ if $gt_1 eq 'G/C' and $gt_2 eq 'C/C';
    $counter->{'A/G'}{'G/G'}++ if $gt_1 eq 'A/G' and $gt_2 eq 'G/G';

    print $_, $/ if $gt_1 ne $gt_2 and $gt_1 ne ( 'C/C' or 'A/A' );
}
close $fh or croak "can't close file: $!";

print map {
    sprintf "The Total Number of %s converted to %s is %d\n",
      $_, %{ $counter->{$_} }
} keys %$counter;

The conitions I used are as follows:
1. Print the present line from the file if and only if, the variable $gt_1 is not equal to $gt_2 AND the varibale $gt_1 is not equal to either 'C/C' or 'A/A',
2. count for the line if varibale $gt_1 is equal to 'A/G' and $gt_2 equal to 'G/G', the same logic is used for other count.

Hope this helps

Yes, Finally I got the output which I was really looking for..Thanks 2teez for ypur time..:)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.