I have a code which parses/validates all the fields present in i/p weblog file.
My first field is ip_address & currently can have a value like 12.45.24.245
Now I have a change where ip_address can be a dummy value something like $10.00 or $23.123.34. or $12.233.

How should I change my regular expression so as to handle both the values ?

Code:#! /usr/bin/perl -w

use strict;

while (<DATA>) {

   $_ =~ m|^
            (\d+\.\d+\.\d+\.\d+)?        # capture  clientip
            \s                                          # followed by space
            ([\w-]+)\s                                  # caputre '-'  or their membership id
            \[(\d{1,2}/\w{3}/\d{4}:\d{2}:\d{2}:\d{2})   # then the date
            \s\+\d{4}\]\s"                              # the '  +0100] "' ready for the method on the next line
            (\w{3,4})\s                                 # ermm, the  method
            (\/.*?)\s                                   # The request
            (\w{4}\/\d\.\d)"\s                          # the protocol
            (\d{3})\s([\d-]+?)\s"                       # status & content length
            (.+?)"\s"                                   # referer
            (.*?)"\s"                                   # useragent will need post processing
            (.+?)"                                      # All cookie  string, will need post processing
          |x;

   my $cookies = cookieStringCleaner($11);

   my ($persistant, $session);
   foreach my $loopvar (@$cookies) {

       if ($loopvar =~ /^eBizDAn/i) {
           $persistant = $loopvar;
       }
       elsif ($loopvar =~ /^eBizCo/i) {
           $session = $loopvar;
       }
   }

   print "\n\n\nLINE: $.\nIP: $1\nMEMBER: $2\nDATE: $3\nMETHOD: $4
\nREQUEST: $5\nPROTOCOL: $6\nSTATUS: $7\nCONLEN: $8\nREFERER:$9\nAGENT: 
$10\nCOOKIE Persist: $persistant\nCOOKIE Session: $session";

}

#
# SUBROUTINES
#
sub cookieStringCleaner() {

   my $cookieString = shift;

   # clean up the data a bit, remove spaces and '-'
   # the '-' is an error by (other language)  random num generator.
   # taking it out will make lookups easier as they will just be a  number

   $cookieString =~ tr/ //d; 
   $cookieString =~ tr/-//d; 

   my @cookies = split(/;/, $cookieString);

   return \@cookies;
}

I tried replacing (\d+.\d+.\d+.\d+)? with something like (\$.*|\d+.\d+.\d+.\d+)? but this gives me 2 extra places in case of $10.00 so it returns me the value "$10.00 - -"

Please suggest. Thanks in advance.

Edited 3 Years Ago by Reverend Jim: Fixed formatting

Can you provide a few lines of the log file to show the options and what you wish to get out of it?

replace

(\d+\.\d+\.\d+\.\d+)?

with

(?:\$[\d.]+|(?:\d+\.){3}\d+)?

and that should work for you!

This article has been dead for over six months. Start a new discussion instead.