I have a .txt log file and must of it is crap. But there are parts that display when a user logs in, and at what time the logged in. Below is a portion of the log file. For example, "user1" is a user logging in and "user2" is another user logging in. So far I have created a python app that counts how many times a user logged in and when the logged in, and I have also counted how many users have logged in for the day, and the top three users.

However, I have not been able to figure out how to see how many users logged in during a three hour time frame. Like lets say from 12:00 to 15:00 and 15:00 to 18:00. I tried to some stuff but it really didn't work.

Example of what the .txt log file looks like:

<IP SNIPPED> - user1 [01/Feb/2008:04:32:12 -0500] "GET /controller?method=getUser HTTP/1.0" 200 305
<IP SNIPPED> - - [01/Feb/2008:04:57:38 -0500] "HEAD /images/DCI.gif HTTP/1.1" 200 -
<IP SNIPPED> - - [01/Feb/2008:04:57:38 -0500] "HEAD /eagent.jnlp HTTP/1.1" 200 -
<IP SNIPPED>- - [01/Feb/2008:04:57:38 -0500] "HEAD /jh.jnlp HTTP/1.1" 200 -
<IP SNIPPED> - - [01/Feb/2008:04:57:38 -0500] "HEAD /smack.jar HTTP/1.1" 200 -
<IP SNIPPED> - - [01/Feb/2008:04:57:38 -0500] "HEAD /jh.jar HTTP/1.1" 200 -
<IP SNIPPED> - - [01/Feb/2008:04:57:39 -0500] "HEAD /images/DCI.gif HTTP/1.1" 200 -
<IP SNIPPED>- - noone [01/Feb/2008:04:57:40 -0500] "GET /controller?method=getNode&name=S14000068 HTTP/1.0" 200 499
<IP SNIPPED> - - [01/Feb/2008:04:57:40 -0500] "GET /help/helpset.hs HTTP/1.1" 200 547
<IP SNIPPED> - - [01/Feb/2008:04:57:43 -0500] "GET /help/map.jhm HTTP/1.1" 200 59650
<IP SNIPPED> - user2 [01/Feb/2008:00:19:16 -0500] "GET /controller?method=getUser HTTP/1.0" 200 307

Here is what I have done so far.

import re
import time
     
fn = 'localhost_access_log.2008-02-01.txt'

#Using regular expressions to get user name from text file
pattLog = re.compile(r'([a-zA-Z0-9]+) \[(.+)\]')
fileList = open(fn).readlines()
logdict = {}
for item in fileList:
    m = pattLog.search(item)
    if m:
        logdict.setdefault(m.group(1), []).append(m.group(2))

# Count how many times the user logged in and times they logged in      
for key in logdict:
    n = len(logdict[key])
    print 'User %s logged in %d time%s:\n%s\n' % \
          (key, n, ['','s'][n > 1 or 0], '\n'.join(logdict[key]))

#Find the top three most logged in users
freqList = [[len(logdict[key]), key] for key in logdict]
freqList.sort(reverse=True)
       
print 'The three most frequent users that logged in are: %s.' % (freqList[1:4])

#Count how many users logged in today      
count = 0
for key in logdict:
    count += 1

#print it out to the screen 
print '%s users logged in today' % (count)

#This is where i try to find how many users logged during a ceertain time frame
d1 = '01/Feb/2008:04:57:40 -0500'
d2 = '01/Feb/2008:15:57:40 -0500'

     

def time_comp(upper, lower, d):
	# upper and lower format %H:%M:%S
    tu = time.strptime(upper, '%H:%M:%S')
    tl = time.strptime(lower, '%H:%M:%S')

    # parse d

    # example string: '01/Feb/2008:04:57:40 -0500'
    tm = time.strptime(d.split()[0].split(':',1)[1], '%H:%M:%S')
    if tl <= tm <= tu:
        return True
        return False
     
print time_comp('16:00:00', '10:00:00', d1)
print time_comp('16:00:00', '10:00:00', d2)
     
if time_comp('16:00:00', '10:00:00', d1):
    print 'User logged in during the target time.'
else:
    print 'Out of range'
     
if time_comp('16:00:00', '10:00:00', d2):
    print 'User logged in during the target time.'
else:
    print 'Out of range'

Print tl, tu, & tm. That should give you some ideas. You might want to use datetime instead or convert everything to seconds..

import datetime
today=datetime.datetime.now()
midnight = datetime.datetime(2008, 2, 25, 0, 0, 0)
if today > midnight:
   print "today is greater", today, "-->", midnight
else:
   print "midnight is greater"
print "difference =", today - midnight

Print tl, tu, & tm. That should give you some ideas. You might want to use datetime instead or convert everything to seconds..

import datetime
today=datetime.datetime.now()
midnight = datetime.datetime(2008, 2, 25, 0, 0, 0)
if today > midnight:
   print "today is greater", today, "-->", midnight
else:
   print "midnight is greater"
print "difference =", today - midnight

I had orginally tried date time but it didnt work. Hmm... If cant figure it out by the end of today. I'll just forget about. Thanks, for the advice.

This wants to be OOP code, I think. The problem you're having is extracting the date meaningfully from the records. The solution is to create a class for the log entries and let the class worry about how to do it.

Here's a crude example:

# logreader

import datetime

class Record(object):

    def __init__(self, string):

        try:
            self.IP, _, self.user, rest = string.split(" ",3)

            self.date, rest = rest.split(']')
            self.date = self.date[1:]

            _, self.comment, rest = rest.split('"')
            self.comment = '"' + self.comment + '"'

            self.thing1, self.thing2 = rest.strip().split(" ")
            
        except:
            print "Line badly formatted!"
            self.IP = None

    def __str__(self):
        return " ".join([self.IP,"-",self.user,"["+self.date+"]",
                         self.comment,self.thing1,self.thing2])

    def get_date(self):
        return datetime.datetime.strptime(self.date, "%d/%b/%Y:%H:%M:%S -0500")
        

# snarf data
f = open("samplelog.txt")
records = []
for line in f:
    r = Record(line)
    if r.IP:
        records.append(r)
f.close()

# A three hour timeframe

start = 3
end = 6

users = []
for rec in records:
    if start <= rec.get_date().hour <= end and rec.user != '-':
        users.append(rec)

print "%d users logged on in the window" % len(users)

Jeff

This wants to be OOP code, I think. The problem you're having is extracting the date meaningfully from the records. The solution is to create a class for the log entries and let the class worry about how to do it.

Here's a crude example:

# logreader

import datetime

class Record(object):

    def __init__(self, string):

        try:
            self.IP, _, self.user, rest = string.split(" ",3)

            self.date, rest = rest.split(']')
            self.date = self.date[1:]

            _, self.comment, rest = rest.split('"')
            self.comment = '"' + self.comment + '"'

            self.thing1, self.thing2 = rest.strip().split(" ")
            
        except:
            print "Line badly formatted!"
            self.IP = None

    def __str__(self):
        return " ".join([self.IP,"-",self.user,"["+self.date+"]",
                         self.comment,self.thing1,self.thing2])

    def get_date(self):
        return datetime.datetime.strptime(self.date, "%d/%b/%Y:%H:%M:%S -0500")
        

# snarf data
f = open("samplelog.txt")
records = []
for line in f:
    r = Record(line)
    if r.IP:
        records.append(r)
f.close()

# A three hour timeframe

start = 3
end = 6

users = []
for rec in records:
    if start <= rec.get_date().hour <= end and rec.user != '-':
        users.append(rec)

print "%d users logged on in the window" % len(users)

Jeff

Ah! I now see! I feel like such an idiot after seeing this. Thanks a lot Jeff. I have one question though could you explain the try block a little bit more clearer I am having troubles understanding it.

Wait a second ... I just wrote a post about how try: except: works and then realized that you probably weren't asking that question. :lol:

Take two:

Inside the try: block, what I'm doing is a crude parsing of the line into fields. My first attempt at it was simply

self.ID, _, self.user, self.date, self.comment, self.thing1, self.thing2 = string.split(" ")

(The _ means "unpack it and throw it away". This field is the '-' character that appears in every line but doesn't apparently mean anything. You could add a field if that's an incorrect assumption)

That didn't work, because the date and comment fields have spaces embedded in them! So next, I decided to parse like this:

* take apart everything up to the [datestamp] field
* take the date
* take the comment
* take the two fields after the comment (which are ...?)

Since the date is surrounded by [ ], I use those to delimit it. Ditto with " " for the comment field.

The line

self.ID, _, self.user, rest = string.split(" ",3)

does this:

>>> ID, _, user, rest = data[0].split(" ",3)
>>> ID
'172.16.9.206'
>>> user
'mwoelk'
>>> rest
'[01/Feb/2008:04:32:12 -0500] "GET /controller?method=getUser HTTP/1.0" 200 305\n'
>>>

That it, it takes apart the line up to the date.

The reason this works is that all of the lines have exactly the same format. If there were any variation at all, the parsing would fail. That's why I called the algorithm 'crude.'

Jeff

(hoping that this was what you wanted rather than an explanation of 'try...except")

Hope it helps,
Jeff

This article has been dead for over six months. Start a new discussion instead.