Hello all,

I have tab delim file as follows

#file1
#version1.1
#columns with the information as follows

state1    class1     report_version_1.1    9428    4567   .    .   call=1;times=5;toss=head->tail;sno=A1B1;effect=positive
state1    class1     report_version_1.1    3862    4877   .    .   call=1;times=5;toss=head->tail;sno=A1B2;effect=negative
state1    class1     report_version_1.1    2376    4567   .    .   call=1;times=5;toss=head->tail;sno=;effect=positive
state2    class1     report_version_1.1    4378    2345   .    .   call=1;times=5;toss=tail->tail;sno=A1B3;effect=positive,negative, both
state2    class1     report_version_1.1    1289    4835   .    .   call=1;times=5;toss=head->tail;sno=;effect=positive

Note: There are no column headers in the file just the three top comments.

I am trying to parse out the 8 column which is basically a string separated by semi colons. I need to remove all the entries that have part1: no sno (value/name) (for eg: for 3 row sno=; i.e there is no record) and also part2: those that have same toss results i.e toss=tail->tail is not needed since both are tail. ** and finally a txt file which will have the filtered entries.
I am trying to divide it in parts and hence Here is what I have come up with so far for the **part1: sno…

#!usr/bin/perl
use warnings;

#inputfile
my $input_file = "/Users/Documents/myfolder/file1.txt";
die "Cannot open $input_file \n" unless (open(IN, $input_file));


#Open output file and write the results
die "result1.txt" unless(open( OUT,"> result1.txt"));

#In the while loop, put the columns that have to be printed in the new file.
while (<IN>) {


    my ($a, $b, $c , $d, $e, $f, $g, $h, $i) = (split /\s+/)[ 1, 2, 3, 4, 5, 6, 7, 8, 9]; 


   #if sno has no name or value then filter the file    

    if ( $i =~ /^[sno]/ =~ /^[sno]/ )
{
       print $OUT " $a \ $b\ $c \ $d \ $e \ $f \ $g \ $h \ $i \n";
    }
}

exit;

Any other better ways to solve this for both sno and toss parts together?

First, always use strict; for anything more than a one-liner.

Second, indexing of arrays starts at 0, so you will refer to the eighth element in the result of your split as $whatever[7]. (Note that $whatever[9] is undefined.)

Also note that the package global variables $a and $b have a special meaning in Perl so avoid declaring lexical variables $a and $b having the same name as those special variables because that can have side effects.

Since the decision whether or not to print a line depends only on the contents of that semicolon-delimited string, you don't need to assign all the columns to individual variables. Just print the whole line after testing the one element that determines whether you want to print it. The following works for me. See the attachements file1.txt and result1.txt

#!/usr/bin/perl
use strict;
use warnings;

#inputfile
my $input_file = 'file1.txt';
my $output_file = 'result1.txt';

open my $fh_in, '<', $input_file or die "Failed to open $input_file: $!";
open my $fh_out, '>', $output_file or die "Failed to open $output_file: $!";

while (<$fh_in>){
    chomp;
    if (ok($_) == 1){
        print $fh_out $_, "\n";
    }
}

sub ok{
    my $rec = shift;
    return 1 if $rec =~ m/^#/; #Comments ok to print
    my @cols = split /\s+/, $rec;
    my $test_this = $cols[7];
    return 0 if $test_this =~ m/sno=;/; #Don't print
    return 0 if $test_this =~ m/toss=head->head/ or $test_this =~ m/toss=tail->tail/;
    return 1; #Must have passed tests
}
Attachments
#file1
#version1.1
#columns with the information as follows
state1    class1     report_version_1.1    9428    4567   .    .   call=1;times=5;toss=head->tail;sno=A1B1;effect=positive
state1    class1     report_version_1.1    3862    4877   .    .   call=1;times=5;toss=head->tail;sno=A1B2;effect=negative
state1    class1     report_version_1.1    2376    4567   .    .   call=1;times=5;toss=head->tail;sno=;effect=positive
state2    class1     report_version_1.1    4378    2345   .    .   call=1;times=5;toss=tail->tail;sno=A1B3;effect=positive,negative, both
state2    class1     report_version_1.1    1289    4835   .    .   call=1;times=5;toss=head->tail;sno=;effect=positive
#file1
#version1.1
#columns with the information as follows
state1    class1     report_version_1.1    9428    4567   .    .   call=1;times=5;toss=head->tail;sno=A1B1;effect=positive
state1    class1     report_version_1.1    3862    4877   .    .   call=1;times=5;toss=head->tail;sno=A1B2;effect=negative

perllearner007:
To add to what d5e5 already said you don't need exit in Perl to exit your script. Also in your die statement always check for error incase the open function fails by using $! Please also consider writing clear Perl 5 syntax of codes.

d5e5:
Nice one using split and subroutine, but I will instead use regex which does the job quiet easyly! And of course just another way of getting the job done!

#!/usr/bin/perl
use warnings;
use strict;

#inputfile
my $input_file = qw(file1.txt);
open my $fh2, '>', "result1.txt" or die "can't open file:$!";
open my $fh,  '<', $input_file   or die "can't open file:$!";
    while (<$fh>) {
        chomp;
        if (/.+?sno=;.+?/) {
            next;
        }
        elsif (/.+?toss=((\w)+?)->\1.+?/) { next; }
        else                              { print $fh2 $_, $/; }
    }
close $fh  or die "can't close file:$!";
close $fh2 or die "can't close file:$!";

Edited 4 Years Ago by 2teez

Comments
Good alternative!

Thankyou both for your inputs. I was a bit confused about indexing starting from 0 because I have noticed at times when I am parsing and I start from 0 the script would parse out the next column to what I want to extract. I wasn't sure if that was the case here hence I started with 1..8 column. I tried running both the scripts. d5e5's script runs and I get a results file but it's the same as input meaning I am not seeing anything getting filtered. 2teez your script gives me a blank output file. Any ideas?

Also, as an additional info. I checked the contents of my input file using cat file1.txt and I see the rows and columns in my terminal. I am assuming it has nothing to do with the file not being read properly.

Attachments
#file1
#version1.1
#columns with the information as follows
state1    class1     report_version_1.1    9428    4567   .    .   call=1;times=5;toss=head->tail;sno=A1B1;effect=positive
state1    class1     report_version_1.1    3862    4877   .    .   call=1;times=5;toss=head->tail;sno=A1B2;effect=negative
state1    class1     report_version_1.1    2376    4567   .    .   call=1;times=5;toss=head->tail;sno=;effect=positive
"state2    class1     report_version_1.1    4378    2345   .    .   call=1;times=5;toss=tail->tail;sno=A1B3;effect=positive,negative, both"
state2    class1     report_version_1.1    1289    4835   .    .   call=1;times=5;toss=head->tail;sno=;effect=positive

perllearner007:

I don't know why you get want you described above. I ran both mine and d5e5 scripts for all the cases of the raw data provided, the results were what you were looking for.

The only time I got the error you talked about in d5e5 script was when I used his script with the data you posted originally on top of this post. And the reason been that each line has a space at the beginning of each line.
To amend that all you needed to do is just to put the the line code below after line 13, in d5e5 script:

s{^\s+}{};

And that works fine.

Please, note that both your perl script and the file1.txt must either be in the same directory or you must specify the path to file1.txt, in your perl script.
If your perl script and file1.txt are the same directory all you need do from your CLI is:

$ perl_script.pl

That runs your script and produce the result1.txt you want, for both mine and d5e5 perl script.

This question has already been answered. Start a new discussion instead.