Hi,

I am new to the perl scripting and need your help regarding one issue.
I have a piece in my perl script which looks something like below -

if ($rec{user} =~ /Major
                  |Minor
                  |Low
                  |High
                  /oxi )
{
  print "Reject" . $_ . "\n";
  next;
}

Now the list present in if loop is going to increase to 150 names so I want to place them in a file & read from a file instead. Also, $rec{user} can be something like "post//. new major thrwad" and it still has to be rejected. And also some of the names in this 150 names list can start with $ for e.g. $Major

What modifications will I have to do in my code for this to work?

Thanks in advance for the help.

Recommended Answers

All 10 Replies

how is your file formated....?

Yep you can read from a file into an array and check or straight from the file.
To an array if you care about speed.

I dont see any problem with $Major in this check up.

;)

post your file ok? Need to look through.

Creating a file format is on me. I can create it in any format which would e suitable for ease of programming. Currently I have it something like this -
> vi list.txt
Major
Minor
$Low
High
High /3.21 www.heal.com
n so on ...

My input will have some string with all the special chars & if "Major" or any other word from list.txt is part of this input string then it need to apend i/p record with word "reject" at the start of i/p string.

I just need a copy of your file or something more concrete to work on. You see giving abstract info is always not the best.
I still dont get you good friend.

I am sorry, I cannot really post the file as it would be against the company policy. I can help you to understand the problem better .
list.txt which I posted earlier are my filter words. I/p against which I need to match these filter words are -
"GET / HTTP/1.0"
"Windows NT 5.2; en-US; rv:1.9.1.6; Major 3.0)"

so in case of 1st i/p as this does not contain any filter words, it should pass. In 2nd i/p it contains filter word "Major" hence the record should be appended with word "Reject" which will make it "RejectWindows NT 5.2; en-US; rv:1.9.1.6; Major 3.0)"

I hope this helps.

The array logic that I was trying is -

$file = "list.txt";
open FILE, $file or die "Can't open file: $!\n";
@list = <FILE>;
close FILE;
foreach $recrd (@list)
{
chomp($recrd);
if ( $rec{user} =~ m/$recrd/ )
{
print "Reject" . $_ . "\n";
next;
}
}


But this fails giving error "Quantifier follows nothing in regex". I cannot reverse the pattern matching logic to if ($recrd =~ m/$rec{user}/) because $recrd is a substring which I need to search in $rec{user}

I still don't get it either. Where does $rec{user} get assigned? Where does it come from? Also, the print $_ is not going to work because you have assigned your foreach to $recd. If $rec{user} is not set you will get the message you are receiving BTW.

Well i think mitchems just said it all.

Look no one needs the actual data. just make a dummy one for love sake ok?

We are all interested in challenges as prog. junkies.
Make us a dummy data to play with ok?
;)

Since you want to use text from a file to construct a regex pattern, you need to remember to escape the characters in this text that would otherwise have special significance to regex. You can use the quotemeta() function to do this.

#!/usr/bin/perl
use strict;
use warnings;

#You can open a file and read from it.
#Here I read from my <DATA> handle for convenience.
#The point is, you want your regex patterns stored
#in an array.
my @arr = <DATA>;
foreach (@arr){
    s/^\s+//;#Remove leading whitespace
    s/\s+$//;#Remove trailing whitespace
    $_ = quotemeta($_); #Escape characters such as $, ., etc,
}
my $pattern = '(' . join('|', @arr) . ')';

#Now let's say you read two input records from <STDIN> (not shown here)
#and assign them to $input1 and $input2
my $input1 = "GET / HTTP/1.0";
my $input2 = "Windows NT 5.2; en-US; rv:1.9.1.6; Major 3.0)";

#Testing $input1
if ($input1 =~ m/$pattern/){
    print "Reject$input1\n"; #Matches
}else{
    print "$input1\n"; #No match
}

#Testing $input2
if ($input2 =~ m/$pattern/){
    print "Reject$input2\n"; #Matches
}else{
    print "$input2\n"; #No match
}

__DATA__
Major
Minor
$Low
High
High /3.21 www.heal.com
$Major

Thanks d5e5!

The logic works fine except in one particular scenario where user can have blank or all spaces value. Even if I comment 2 white space removing logic lines from the code, these records get reject appended which is not desired.

So, instead I tried something like this -

open FILE, $file or die "Can't open file: $!\n";
@list = <FILE>;
close FILE;
foreach $recrd (@list)
{
chomp($recrd); # to remove new line char at end of each filter line
if ( $rec{user} =~ m/\Q$recrd\E/ ) # using \Q..\E removes REGEX error
        {
        print "REJECT$delim" . "FILTEROUT known bots$delim" . $_ . "\n";
        $logging{$rec{logFileName}}->{FILTER_OUT_USER}++;
        last; # to break out of for loop if match found
        }
}

This works fine too, but I get 2 records in output for each rejected record, one with Reject appended at the start & one usual i/p record. I am not able to figure out from where 2 records are coming. Is using $_ causing it ?
My i/p comes through an ETL tool Ab Initio graph. So perl script reads one record at a time and processes it. $rec contains many fields. Above piece from script validates the user name.

I don't know the source of the problem with your script and can't say if the problem resides in $_ because I don't see where any value is assigned to $_ or $delim. What I can do is try to fix my script so it won't reject blank input.

What I suspect happened when you tested my script: your file of regex patterns may have included a blank record (just an extra carriage-return/line-feed character would do it) which resulted in an element containing a string made up of whitespace characters or nothing at all being included in the array I use to build the $pattern. I can fix that by removing any zero-length elements from the array before building the pattern. Please try the following:

#!/usr/bin/perl
use strict;
use warnings;

#You can open a file and read from it.
#Here I read from my <DATA> handle for convenience.
#The point is, you want your regex patterns stored
#in an array.
my @arr = <DATA>;
foreach (@arr){
    s/^\s+//;#Remove leading whitespace
    s/\s+$//;#Remove trailing whitespace
    $_ = quotemeta($_); #Escape characters such as $, ., etc,
}

@arr = grep(length($_) > 0, @arr); #Weed out all zero-length elements

my $pattern = '(' . join('|', @arr) . ')';

#Now let's say you read two input records from <STDIN> (not shown here)
#and assign them to $input1 and $input2
my $input1 = "GET / HTTP/1.0";
my $input2 = "Windows NT 5.2; en-US; rv:1.9.1.6; Major 3.0)";
my $input3 = "\t   \n"; #Input can be all blanks or various kinds of whitespace

#Testing blank input
#The following matches only if you include a blank line in your $pattern
if ($input3 =~ m/$pattern/){
    print "Reject$input3\n"; #Matches
}else{
    print "$input3\n"; #No match
}

__DATA__
Major
Minor
$Low
High
High /3.21 www.heal.com
$Major
   
Blank lines above and below this one added for testing

Thanks a lot d5e5!

Addition of blank value handling worked well. And about the 2 records in the output, since I am reading i/p in pipeline from ETL tool, adding "next;" statement in if loop resolved it.

Thanks again! Really appreciate your help on this.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, learning, and sharing knowledge.