Heya Daniweb,

I've been working on Regular Expressions, thanks to cghtkh who told me about them. I've used resources from:
NewThinkTank.com
Python Regular Expression Documentation
and Daniweb.com!

I figured I'd share my code, ask a few questions, and get some feedback on what I did if I can. Feedback helps me improve my programming skills, Hopefully I won't be a pest to anyone.

import re #Imports the Regular Expression Library

f = open('randomcharacters.txt') #Opens "randomcharacters.txt" from same directory as the script

lineToSearch = ""
for line in f:
    lineToSearch += line #Goes through all the lines in the data one by one
pingFinderPattern = re.compile('time=([0-9.]+) ms')
Pattern1 = re.search('time=([0-9.]+) ms', line)
Pattern1 = re.findall('time=([0-9.]+) ms',lineToSearch)
for i in Pattern1: #Question: Is the i in this mean anything special such as Iterations? I couldn't get an answer for that.
    print(i)

This code took ~4000 pings and compiled it into some numbers which I then pasted to Google Docs and generated these graphs.
Google Docs Graphs Link
I'm interested in hearing feedback on this, I removed a few Outliers in the ~400ms range for Google, however StackOverflow that 5+ High Ping times per spike so I kept it.

Question:

for i in Pattern1: #Question: Is the i in this mean anything special such as Iterations? I couldn't get an answer for that.
    print(i)

I'm looking to use matplotlib to plot data as an all in one tool so I don't have to copy and paste and how I think this would work potentially would be to get the data in a csv form and just put the variable into the plot.

Example:

import re
import matplotlib.pyplot as plt

f = open('randomcharacters.txt')
lineToSearch = ""
for line in f:
    lineToSearch += line
pingFinderPattern = re.compile('time=([0-9.]+) ms')
Pattern1 = re.search('time=([0-9.]+) ms', line)
Pattern1 = re.findall('time=([0-9.]+) ms',lineToSearch)
for i in Pattern1:
    print(i)
    plt.plot(i)
    plt.ylabel('Ping Time')
    plt.show

However this returns one value per line I believe, and I'm not sure how to go about this. I've spent some time searching around and watching tutorial videos.

Thanks for your time everyone, I truly appreciate each and every response and all criticism, I hope one day to be a skilled programmer who can give wise advice to others learning. :)

Respectfully,
BombShock.

Edited 6 Years Ago by Bombshock: n/a

import re #Imports the Regular Expression Library

f = open('randomcharacters.txt') #Opens "randomcharacters.txt" from same directory as the script

lineToSearch = ""
for line in f:
    lineToSearch += line #Goes through all the lines in the data one by one
pingFinderPattern = re.compile('time=([0-9.]+) ms')
Pattern1 = re.search('time=([0-9.]+) ms', line)
Pattern1 = re.findall('time=([0-9.]+) ms',lineToSearch)
for i in Pattern1: #Question: Is the i in this mean anything special such as Iterations? I couldn't get an answer for that.
    print(i)

There are a lot of issues.

  • Naming: You say Pattern1 = re.search(...) . Pattern1 is not a pattern. A pattern is the description of the regular expression, for example: 'time=([0-9.]+) ms' And while we are looking at names, the usual way to name variables is to use lower_case_names or lowerCamelCaseNames Upper case names are usually reserved for classes. I would have written match1 = ... It is important that variables and functions have names that make it easy to understand the code. Misusing a name as you did, makes it much more confusing and programmers have enough complexity without having to handle badly named variables.
  • Compiling regexes and using them: On line 8 you say pingFinderPattern = re.compile('time=([0-9.]+) ms') (I would have named it pingFinderRe since it is not a raw pattern, but a compiled regular expression.) It is good to compile the re, But you never use it!. Lines 9 and 10 should look like match1 = pingFinderRe.[I]search[/I](line) (or ...[I]findall[/I]... )
  • Looping: I think you are trying to look at each line in some file and find lines that are a 'ping' response. Your code does not do that efficiently. Instead, it builds (expensively) a single string that holds all the lines of the file (that happens at line 7 of your code), then it searches the last line of the file (and throws away the result), then findalls the string that holds all the lines. The last two lines print, one per line, all the substrings that match the hard coded pattern at line 10. I believe this is not what you want to do.

Here is a sketch of what you might do instead

import re # use regex package
pingPatternRe =re.compile(...) # compile the pattern I want to recognize
hits = []  # store the hits as I find them
with open(randomcharacters.txt) as f: # open the file so as to ensure it gets closed
  for line in f: # loop through every line in the file
     match = pingPatternRe.search(line) # look for a match to the pingPattern
     if match: # if pingPattern is not there, match is None, otherwise...
        hits.append(line.strip()) # store the matching line without the newline
# Unless error, we've seen every line and stored good match lines in the hits list
for h in hits: # loop over the successful matches
  print(h) # and print them

I on the other hand, might replace lines 5 through 11 with this less obvious single line, explained below: print("".join([x for x in f if pingPatternRe.search(x)]))

  • [x for x in f...] makes a list of lines in f ...
  • [ ... if pingPatternRe.search(x)] ... where the search succeeds
  • "".join([[I]a_list[/I]]) merge the items (must each be a string) in [[I]a_list[/I]] using the empty string between the items. Note that we kept the newlines when we built the list, so the newlines are still there (see subtle issue below)
  • print(...) prints the lines using their own newlines

(subtle issue): The line ends in the file might not be the same as the line ends that your console expects, so making use of them when printing is not robust. (It is however, very likely to work for any student project)

Edited 6 Years Ago by griswolf: n/a

Comments
Excellent critique.

subtle issue:
it is better not to build the list if it is not saved for future, join is happy with generator:

import re # use regex package
pingPatternRe = re.compile('=([0-9.]+) ms') # compile the pattern I want to recognize
with open('randomcharacters.txt') as f: # open the file so as to ensure it gets closed
    print("\n".join(pingtime for x in f for pingtime in pingPatternRe.findall(x)))

Edited 6 Years Ago by pyTony: n/a

This article has been dead for over six months. Start a new discussion instead.