i'm brand new to perl and trying to write a script to read a logfile for our weblogic server and write certain entries into a database table. the log file is a log4j weblogic log. here is a sample:

####<Jun 7, 2005 2:46:38 PM EDT> <Info> <Enterprise> <ga003sds> <tms1> <ExecuteThread: '12' for queue: 'weblogic.kernel.Default'> <setup> <> <000000> <Calculating Rates for all Routes (Time = 1299 ms)>
####<Jun 7, 2005 2:46:38 PM EDT> <Info> <Enterprise> <ga003sds> <tms1> <ExecuteThread: '12' for queue: 'weblogic.kernel.Default'> <setup> <> <000000> <Time to get routes: 4105>
####<Jun 7, 2005 2:46:40 PM EDT> <Info> <Enterprise> <ga003sds> <tms1> <Thread-17> <anonymous> <> <000000> <HUDCache: Retrieve new value (Time = 12636 ms)>
####<Jun 7, 2005 2:47:09 PM EDT> <Warning> <EJB> <ga003sds> <tms1> <ExecuteThread: '13' for queue: 'weblogic.kernel.Default'> <anonymous> <> <BEA-010096> <The Message-Driven EJB: EcommerceOrderManager is unable to connect to the JMS destination: integration.eCommerce.orderCreation.request. Connection failed after 1,088 attempts. The MDB will attempt to reconnect every 10 seconds. This log message will repeat every 600 seconds until the condition clears.>
####<Jun 7, 2005 2:47:09 PM EDT> <Warning> <EJB> <ga003sds> <tms1> <ExecuteThread: '13' for queue: 'weblogic.kernel.Default'> <anonymous> <> <BEA-010061> <The Message-Driven EJB: EcommerceOrderManager is unable to connect to the JMS destination: integration.eCommerce.orderCreation.request. The Error was:
[EJB:011010]The JMS destination with the JNDI name: integration.eCommerce.orderCreation.request could not be found. Please ensure that the JNDI name in the weblogic-ejb-jar.xml is correct, and the JMS destination has been deployed.>
####<Jun 7, 2005 2:47:10 PM EDT> <Info> <Enterprise> <ga003sds> <tms1> <Thread-17> <anonymous> <> <000000> <HUDCache: Retrieve new value (Time = 11907 ms)>
####<Jun 7, 2005 2:47:40 PM EDT> <Info> <Enterprise> <ga003sds> <tms1> <Thread-17> <anonymous> <> <000000> <HUDCache: Retrieve new value (Time = 11983 ms)>
####<Jun 7, 2005 2:48:10 PM EDT> <Info> <Enterprise> <ga003sds> <tms1> <Thread-17> <anonymous> <> <000000> <HUDCache: Retrieve new value (Time = 11961 ms)>
####<Jun 7, 2005 2:48:39 PM EDT> <Info> <Enterprise> <ga003sds> <tms1> <Thread-17> <anonymous> <> <000000> <HUDCache: Retrieve new value (Time = 11850 ms)>
####<Jun 7, 2005 2:49:10 PM EDT> <Info> <Enterprise> <ga003sds> <tms1> <Thread-17> <anonymous> <> <000000> <HUDCache: Retrieve new value (Time = 11949 ms)>
####<Jun 7, 2005 2:49:40 PM EDT> <Info> <Enterprise> <ga003sds> <tms1> <Thread-17> <anonymous> <> <000000> <HUDCache: Retrieve new value (Time = 12519 ms)>
####<Jun 7, 2005 4:35:14 PM EDT> <Error> <Enterprise> <ga003sds> <tms1> <Thread-12> <<anonymous>> <> <000000> <EUC948732945 - Tue Jun 07 16:35:14 EDT 2005 - unknown - - Broken pipe Broken pipe
at Method)
at weblogic.servlet.internal.ChunkUtils.writeChunkTransfer(
at weblogic.servlet.internal.ChunkUtils.writeChunks(
at weblogic.servlet.internal.ChunkOutput.flush(
at weblogic.servlet.internal.ChunkOutput.checkForFlush(
at weblogic.servlet.internal.ChunkOutput.print(
at weblogic.servlet.internal.ChunkOutputWrapper.print(
at weblogic.servlet.jsp.JspWriterImpl.print(
at jsp_servlet._routing._nextflightout.__template._writeText(
at jsp_servlet._routing._nextflightout.__template._jspService(
at weblogic.servlet.jsp.JspBase.service(
at weblogic.servlet.internal.ServletStubImpl$
at weblogic.servlet.internal.ServletStubImpl.invokeServlet(
at weblogic.servlet.internal.ServletStubImpl.invokeServlet(
at weblogic.servlet.internal.RequestDispatcherImpl.include

The entries in the log have a standard format: ####<date><log_level><type><server_hostname><servername><category><service><unknown><message_id><message_text>

The problem is that the message_text can consists of multiple lines of stack trace when the log_level = "Error". These happen to be the main records I'm interested in for now, but there are also some single-lined warnings we want to capture, as well.

I had started to write the following (please forgive, it's my first time using perl and the code isn't much :o ), but got hung up on the multi-line stuff.

#!/usr/bin/perl -w
# get_errors.plx
# get errors from log file

use strict;

while (<>) {
    my ($date, $log_level, $type, $serverhostname, $servername,
#            $category, $service, $unknown, $message_id, $message_text) = split(/> </, $_);
             $category, $service, $unknown, $message_id, $message_text) =
        `m/^####<([ ,:A-Za-z0-9]+?)> <(.*?)> <(.*?)> <(.*?)> <(.*?)> <(.*?)> <(.*?)> <(.*?)> <(.*?)> <(.*?)>$/`;
    print join "|", $date,$log_level, $type, $serverhostname, $servername,
            $category, $service, $unknown, $message_id, $message_text;
    print "\n";

I'm assuming the while(<>) will only read one line at a time? the 'm' that i tried to put in the expression errors out, so i'm not even sure if my expression to capture the lines is correct...

How can I properly read this file to capture a log entry at a time, whether it's single or multiple lines? any assistance is greatly, greatly appreciated!! :mrgreen:

Recommended Answers

All 9 Replies

I have a nice page for you to look at regarding "regular expressions" or "regex". Now, Regex is nice, but can be a bit confusing. Line Noise, as it's been called, can do magic, however, with a correctly set up expression, or expressions. I see that you tried with "m" which is the regular expression for "matching". In Perl, it defaults to m, so you could have done the same with just // instead of m//, but it's always better for readable to use the m. Here is a great page to learn and understand regex:

I hope this helps some. If you are still having mad troubles with it, I'll be glad to take a look at your code, and offer what help I can. Also, I don't see you opening a file to read the input from. The while (<>) { actually is trying to read input from STDIN, unless you have changed that somewhere previously.

open(FH "/home/mydir/somefile.txt");
     while (<FH>) {


The Above is the best way IMO to go about this. It helps in readability... you are opening the file, using the filehandle FH. Then, In The While Loop, You Are Reading from FH (<FH>), line by line until it finds the EOF (end of file) character.

Also something to consider, is using the split function to get each of the information in the file in such <stuff here in your file>. For example, when I parse an HTML Page with Perl I do something like this:

@tags = split(/</, $_);
foreach $tag (@tags) {
     if (lc($tag) eq "b>") {
          print "Found Bold Tag\n";

And You Could Use Similar code to read your file, and then check which "tag" you are on.... it doesn't like look what you want to do is going to be an easy task, but let me know if I can help any further.

hi comatose,

thanks for the link :) . i've got some experience with regular expressions, though the syntax in perl is a little different than what i'm used to. here is the basic regular expression i came up with to match one line:

^[#]{4}<.*?> <.*?> <.*?> <.*?> <.*?> <.*?> [<]{1,2}.*?[>]{1,2} <.*?> [<]{1,2}.*?[>]{1,2} [<]{0,1}.*?[>]{0,1}$

but, the problem i'm having is that i don't just want the one line if the last field has a java stack trace. that is, can my regular expression continue to pick up information from lines that follow as in:

^[#]{4}<.*?> <.*?> <.*?> <.*?> <.*?> <.*?> [<]{1,2}.*?[>]{1,2} <.*?> [<]{1,2}.*?[>]{1,2} [<]{0,1}.*?[>]$

? Here, i specified that the end of the line must be '>'. I'm wondering if it's going to fail that condition if it's not all on one line.

beyond that, i would like to get at least a few lines of the stack trace (or at least 255 characters) and remove any newlines from that section. ultimately, i want to read a log file and write out a file where each line is the pipe-delimited fields (and have the last field from the log file translated to a single line if it's multiple).

as for not specifying a file, i'm actually running the perl script with a file directed in as standard input for now: get_errors.plx < testfile.log

i'm going to be running this perl script on multiple log files, so i was going to have another script with a while loop to call in this manner on each file. i will probably change it to have the file name as a parameter. it would increase readibility as you said.

I'm still playing around with it, but would welcome any assistance and highly appreciate what you've offered so far. is a great site to learn about using multi-line regex's with Perl. There are a few methods there that can be used, the older, depreciated method is to set $*, but the newer methods, as of Perl 5, use an m and or s modifier.

Let me know what you come up with.... so that other people with similar problems can find the resolve here.

thanks again for your assistance., i'll certainly share whatever my findings are.

Here's a bit of code that prints the multi-line error message from that log sample:

#!/usr/bin/perl -w
use strict;
my $log="log";
my $partial="";
my $state=1;
if ( $ARGV[0] ) { $log=$ARGV[0] }
open ( FILE,$log ) or die "Failed to open $log:$!\n";
while ( <FILE> )
  if ( $state ) # watching two states, this one is waiting on new log line
    { check_for_complete_line($_) }
  else # 2nd state is waiting on another piece of previous partial line
    { $state=1; $partial .= $_ ; check_for_complete_line($partial); }
close FILE;
sub check_for_complete_line
  {  # enters with $state=1 for new line, $state=0 for partial line previous
  my $line = shift;
  if ($line =~ /^####<(.*)> <(.*)> <(.*)> <(.*)> <(.*)> <(.*)> <(.*)> <(.*)> <(.*)> <.*>$/ms)
    if ($2 eq 'Error')
      { # only 9 positional variables - probably would be better to use split here
      print "$1|$2|$3|$4|$5|$6|$7|$8|$9";
      $line =~ s/^####<.*> <.*> <.*> <.*> <.*> <.*> <.*> <.*> <.*> <(.*)>$/$1/ms;
      print "|$line\n";
    else {} # handle info/warning
  else  # incomplete log line, save the part we have, indicate state change
    { $state=0; $partial = $line }  
commented: Thumbs Up, I love commented Code +1

Nice Work There.... Even Commented Some of the code :)

Thx =) Hmm my first comment in the subroutine is wrong though :(
It always enters the subroutine in state=1. The comment should have been

# always enters in $state=1 ie: whether it was a fresh line or it had a partial, before it gets to the subroutine, a partial is joined with the current line and it's considered a fresh line.


awesome, kordaff!! that helps a lot and is much appreciated! :mrgreen:

Anything I can do to help =)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.