Hi,

I am new to the perl scripting and need your help regarding one issue.
I have a piece in my perl script which looks something like below -

if ($rec{user} =~ /Major
                  |Minor
                  |Low
                  |High
                  /oxi )
{
  print "Reject" . $_ . "\n";
  next;
}

Now the list present in if loop is going to increase to 150 names so I want to place them in a file & read from a file instead. Also, $rec{user} can be something like "post//. new major thrwad" and it still has to be rejected. And also some of the names in this 150 names list can start with $ for e.g. $Major

What modifications will I have to do in my code for this to work?

Thanks in advance for the help.

Edited 3 Years Ago by Reverend Jim: Fixed formatting

how is your file formated....?

Yep you can read from a file into an array and check or straight from the file.
To an array if you care about speed.

I dont see any problem with $Major in this check up.

;)

post your file ok? Need to look through.

Creating a file format is on me. I can create it in any format which would e suitable for ease of programming. Currently I have it something like this -
> vi list.txt
Major
Minor
$Low
High
High /3.21 www.heal.com
n so on ...

My input will have some string with all the special chars & if "Major" or any other word from list.txt is part of this input string then it need to apend i/p record with word "reject" at the start of i/p string.

I just need a copy of your file or something more concrete to work on. You see giving abstract info is always not the best.
I still dont get you good friend.

I am sorry, I cannot really post the file as it would be against the company policy. I can help you to understand the problem better .
list.txt which I posted earlier are my filter words. I/p against which I need to match these filter words are -
"GET / HTTP/1.0"
"Windows NT 5.2; en-US; rv:1.9.1.6; Major 3.0)"

so in case of 1st i/p as this does not contain any filter words, it should pass. In 2nd i/p it contains filter word "Major" hence the record should be appended with word "Reject" which will make it "RejectWindows NT 5.2; en-US; rv:1.9.1.6; Major 3.0)"

I hope this helps.

The array logic that I was trying is -

$file = "list.txt";
open FILE, $file or die "Can't open file: $!\n";
@list = <FILE>;
close FILE;
foreach $recrd (@list)
{
chomp($recrd);
if ( $rec{user} =~ m/$recrd/ )
{
print "Reject" . $_ . "\n";
next;
}
}


But this fails giving error "Quantifier follows nothing in regex". I cannot reverse the pattern matching logic to if ($recrd =~ m/$rec{user}/) because $recrd is a substring which I need to search in $rec{user}

I still don't get it either. Where does $rec{user} get assigned? Where does it come from? Also, the print $_ is not going to work because you have assigned your foreach to $recd. If $rec{user} is not set you will get the message you are receiving BTW.

Well i think mitchems just said it all.

Look no one needs the actual data. just make a dummy one for love sake ok?

We are all interested in challenges as prog. junkies.
Make us a dummy data to play with ok?
;)

Since you want to use text from a file to construct a regex pattern, you need to remember to escape the characters in this text that would otherwise have special significance to regex. You can use the quotemeta() function to do this.

#!/usr/bin/perl
use strict;
use warnings;

#You can open a file and read from it.
#Here I read from my <DATA> handle for convenience.
#The point is, you want your regex patterns stored
#in an array.
my @arr = <DATA>;
foreach (@arr){
    s/^\s+//;#Remove leading whitespace
    s/\s+$//;#Remove trailing whitespace
    $_ = quotemeta($_); #Escape characters such as $, ., etc,
}
my $pattern = '(' . join('|', @arr) . ')';

#Now let's say you read two input records from <STDIN> (not shown here)
#and assign them to $input1 and $input2
my $input1 = "GET / HTTP/1.0";
my $input2 = "Windows NT 5.2; en-US; rv:1.9.1.6; Major 3.0)";

#Testing $input1
if ($input1 =~ m/$pattern/){
    print "Reject$input1\n"; #Matches
}else{
    print "$input1\n"; #No match
}

#Testing $input2
if ($input2 =~ m/$pattern/){
    print "Reject$input2\n"; #Matches
}else{
    print "$input2\n"; #No match
}

__DATA__
Major
Minor
$Low
High
High /3.21 www.heal.com
$Major

Thanks d5e5!

The logic works fine except in one particular scenario where user can have blank or all spaces value. Even if I comment 2 white space removing logic lines from the code, these records get reject appended which is not desired.

So, instead I tried something like this -

open FILE, $file or die "Can't open file: $!\n";
@list = <FILE>;
close FILE;
foreach $recrd (@list)
{
chomp($recrd); # to remove new line char at end of each filter line
if ( $rec{user} =~ m/\Q$recrd\E/ ) # using \Q..\E removes REGEX error
        {
        print "REJECT$delim" . "FILTEROUT known bots$delim" . $_ . "\n";
        $logging{$rec{logFileName}}->{FILTER_OUT_USER}++;
        last; # to break out of for loop if match found
        }
}

This works fine too, but I get 2 records in output for each rejected record, one with Reject appended at the start & one usual i/p record. I am not able to figure out from where 2 records are coming. Is using $_ causing it ?
My i/p comes through an ETL tool Ab Initio graph. So perl script reads one record at a time and processes it. $rec contains many fields. Above piece from script validates the user name.

Edited 3 Years Ago by Reverend Jim: Fixed formatting

I don't know the source of the problem with your script and can't say if the problem resides in $_ because I don't see where any value is assigned to $_ or $delim. What I can do is try to fix my script so it won't reject blank input.

What I suspect happened when you tested my script: your file of regex patterns may have included a blank record (just an extra carriage-return/line-feed character would do it) which resulted in an element containing a string made up of whitespace characters or nothing at all being included in the array I use to build the $pattern. I can fix that by removing any zero-length elements from the array before building the pattern. Please try the following:

#!/usr/bin/perl
use strict;
use warnings;

#You can open a file and read from it.
#Here I read from my <DATA> handle for convenience.
#The point is, you want your regex patterns stored
#in an array.
my @arr = <DATA>;
foreach (@arr){
    s/^\s+//;#Remove leading whitespace
    s/\s+$//;#Remove trailing whitespace
    $_ = quotemeta($_); #Escape characters such as $, ., etc,
}

@arr = grep(length($_) > 0, @arr); #Weed out all zero-length elements

my $pattern = '(' . join('|', @arr) . ')';

#Now let's say you read two input records from <STDIN> (not shown here)
#and assign them to $input1 and $input2
my $input1 = "GET / HTTP/1.0";
my $input2 = "Windows NT 5.2; en-US; rv:1.9.1.6; Major 3.0)";
my $input3 = "\t   \n"; #Input can be all blanks or various kinds of whitespace

#Testing blank input
#The following matches only if you include a blank line in your $pattern
if ($input3 =~ m/$pattern/){
    print "Reject$input3\n"; #Matches
}else{
    print "$input3\n"; #No match
}

__DATA__
Major
Minor
$Low
High
High /3.21 www.heal.com
$Major
   
Blank lines above and below this one added for testing

Thanks a lot d5e5!

Addition of blank value handling worked well. And about the 2 records in the output, since I am reading i/p in pipeline from ETL tool, adding "next;" statement in if loop resolved it.

Thanks again! Really appreciate your help on this.

This question has already been answered. Start a new discussion instead.