I'm an information security professional who's decided to teach myself python, so for my first project I wanted to make something that I could actually find useful, so I've developed a small program for log file parsing and amazingly after some trial and error, it works :) However I've reached an impasse, and I'm not familiar enough yet with fancy output formats to figure it out, thus I come asking for advice:
Essentially what I have so far, is some code which prompts the user for the paths to two separate files, one being the intended log file, and the other a 'signature file', which is just a text file with each line being a regular expression of what might be a malicious log signature, and a commented out description of what the regular expression represents. The program then compiles the entire contents of the signature file, ignoring the commented portion, into one large regular expression; basically I just did a '|'.join(lines), and then compares the resulting piped regular expression against each line in the log file, increments a count if it gets a match, and then outputs each line to the screen, with a total count.
I thought it was pretty cool for a first program, but I'd like to take it a step further, as I'm finding that the program spewing out lines of a log file is rather ugly. So what I'd like to do, is write in some functionality where it still does the regular expression comparison, but instead of printing out each line, I'd like to output to the screen a column which displays the commented description, then in another column the number of times that particular expression was found. I've been playing around with things like str.find and str.rjust, but I'm kind of lost.
Not sure how to get it to count the matches of each individual regular expression, and not the count in total. Nor can I figure out how to get it to ignore the comment while doing the search, but to then print in a column the exact part I told it to ignore if it indeed finds a match. For example say I was comparing a log file for a web server against a signature file containing the regular expressions for known attack signatures. I'm trying to get it to look something like this:
--Signatures Found-- --Number of Matches--
Malicious Signature #1 999
Malicious Signature #2 999
Malicious Signature #3 999