I have a large data set (12,000 rows X 14 columns); the first 4 rows as below:

    x1  y1  0.02    NAN NAN NAN NAN NAN NAN 0.004   NAN NAN NAN NAN
    x2  y2  NAN 0.003   NAN 10  NAN 0.03    NAN 0.004   NAN NAN NAN NAN
    x3  y3  NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN
    x4  y4  NAN 0.004   NAN NAN NAN NAN 10  NAN NAN 30  NAN 0.004

I need to remove any row with "NAN" in columns 3-14 and then output the rest of the dataset. I wrote the following code:

            #!usr/bin/perl

            use warnings;
            use strict;
            use diagnostics;

            open(IN, "<", "file1.txt") or die "Can't open file for reading:$!";

            open(OUT, ">", "file2.txt") or die "Can't open file for writing:$!";

            my $header = <IN>;
            print OUT $header;

            my $at_line = 0;

            my $col3;
            my $col4;
            my $col5;
            my $col6;
            my $col7;
            my $col8;
            my $col9;
            my $col10;
            my $col11;
            my $col13;
            my $col14;
            my $col15;

            while (<IN>){
            chomp;
            my @sections = split(/\t/);

            $col3 = $sections[2];
            $col4 = $sections[3];;
            $col5 = $sections[4];
            $col6 = $sections[5];
            $col7 = $sections[6];
            $col8 = $sections[7];
            $col9 = $sections[8];
            $col10 = $sections[9];
            $col11 = $sections[10];
            $col13 = $sections[11];
            $col14 = $sections[12];
            $col15 = $sections[13];

            if ($col3 eq "NAN" && $col4 eq "NAN" && $col5 eq "NAN" && $col6 eq "NAN" && $col7 eq "NAN" && $col8 eq "NAN" && $col9 eq "NAN" && $col10 eq "NAN" && $col11 eq "NAN" && $col12 eq "NAN" && $col13 eq "NAN" && $col14 eq "NAN" && $col5 eq "NAN"){                                                   
            $at_line = $.;
            }   
            else {
            print OUT "$_\n";
            }
            }

            close(IN);
            close(OUT);

Running this code gave the following error:
Use of uninitialized value $col3 in string eq at filter.pl

     line 46, <IN> line 2 (#1)

How can I make this program work? Thanks.

Hi Perly,
In your code, variable $col12 is not defined. Check it. Moreover, you don't have to do it this way. You can try this:

use strict;
use warnings;

while (<DATA>) {
    for my $value ( ( split /\s+/, $_ )[ 2 .. 13 ] ) {
        print $_ and last if $value ne 'NAN';
    }
}

__DATA__
x1  y1  0.02    NAN NAN NAN NAN NAN NAN 0.004   NAN NAN NAN NAN
x2  y2  NAN 0.003   NAN 10  NAN 0.03    NAN 0.004   NAN NAN NAN NAN
x3  y3  NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN
aqx4  y4  NAN 0.004   NAN NAN NAN NAN 10  NAN NAN 30  NAN 0.004

The above script produce this:

x1  y1  0.02    NAN NAN NAN NAN NAN NAN 0.004   NAN NAN NAN NAN
x2  y2  NAN 0.003   NAN 10  NAN 0.03    NAN 0.004   NAN NAN NAN NAN
aqx4  y4  NAN 0.004   NAN NAN NAN NAN 10  NAN NAN 30  NAN 0.004

I hope this helps you.

Thanks 2teez. I really appreciate your help. The above code works for a small data, but returned the following error with large dataset:
"Use of uninitialized value $value in string ne at program.pl"

Hi,
Maybe all your dataset does not have the same column. Would you like to attach your file that contain your dataset or something that looks similiar. Maybe if we see your dataset a more accurate solution could be given.

There are other solutions that one can give but since what you published is what one used. E.g. One can stop the reoccuring warning, but I believe it is better to address the reason for the warnings.

You can give more dataset and let's us check it.
Thanks.

Hi 2teez,

Problem is now solved - it works now. The problem was from my re-saving the input dataset as I inadvertently moved some columns into wrong places. Once again, thank you so much!!!

Just one question - Can you explain the following line please:
print $_ and last if $value ne 'NAN'

Edited 3 Years Ago by perly: ask a question

This question has already been answered. Start a new discussion instead.