944,113 Members | Top Members by Rank

Ad:
  • Python Discussion Thread
  • Unsolved
  • Views: 554
  • Python RSS
Nov 7th, 2009
0

findall

Expand Post »
hey guys,
i have a problem.. i have this chunk of text like this.....

Python Syntax (Toggle Plain Text)
  1. No. Time Source Destination Protocol Info
  2. 2 0.005318 192.168.110.33 192.168.110.44 ICMP Echo (ping) request
  3.  
  4. Frame 2 (98 bytes on wire, 98 bytes captured)
  5. Ethernet II, Src: Cisco-Li_4d:e1:30 (00:1c:10:4d:e1:30), Dst: IntelCor_4d:77:83 (00:13:02:4d:77:83)
  6. Internet Protocol, Src: 192.168.110.33 (192.168.110.33), Dst: 192.168.110.44 (192.168.110.44)
  7. Internet Control Message Protocol
  8.  
  9. No. Time Source Destination Protocol Info
  10. 3 0.998730 192.168.110.33 192.168.110.44 DHCP DHCP Offer - Transaction ID 0x9e0e832
  11.  
  12. Frame 3 (347 bytes on wire, 347 bytes captured)
  13. Ethernet II, Src: Cisco-Li_4d:e1:30 (00:1c:10:4d:e1:30), Dst: IntelCor_4d:77:83 (00:13:02:4d:77:83)
  14. Internet Protocol, Src: 192.168.110.33 (192.168.110.33), Dst: 192.168.110.44 (192.168.110.44)
  15. User Datagram Protocol, Src Port: bootps (67), Dst Port: bootpc (68)
  16. Bootstrap Protocol
  17.  
  18. No. Time Source Destination Protocol Info
  19. 4 0.998917 0.0.0.0 255.255.255.255 DHCP DHCP Request - Transaction ID 0x9e0e832
  20.  
  21. Frame 4 (348 bytes on wire, 348 bytes captured)
  22. Ethernet II, Src: IntelCor_4d:77:83 (00:13:02:4d:77:83), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
  23. Internet Protocol, Src: 0.0.0.0 (0.0.0.0), Dst: 255.255.255.255 (255.255.255.255)
  24. User Datagram Protocol, Src Port: bootpc (68), Dst Port: bootps (67)
  25. Bootstrap Protocol

basically i'm trying to extract the DHCP server and the the IP address its giving me... sp i want to look for the text between "DHCP Offer" and "bootstrap" and then from that chunk get the part between "Internet Protocol) Src:" and "("

Python Syntax (Toggle Plain Text)
  1. temp = re.findall("DHCP\s+Offer(.)Bootstrap", text)
  2. print (temp)
  3. name=re.findall("(Internet Protocol)\sSrc:.[(]", temp)
  4. print name

but i think its not reading past the 1st line between the srearch words
Last edited by mitsuevo; Nov 7th, 2009 at 12:53 am.
Similar Threads
Reputation Points: 10
Solved Threads: 1
Newbie Poster
mitsuevo is offline Offline
21 posts
since Nov 2009
Nov 7th, 2009
0
Re: findall
You can use this code snippet.
Reputation Points: 930
Solved Threads: 668
Posting Maven
Gribouillis is offline Offline
2,656 posts
since Jul 2008
Nov 7th, 2009
0
Re: findall
Thanks! that helps.. i also found another solution that is a bit simpler...
i do something like this

Python Syntax (Toggle Plain Text)
  1. def findDHCP(filename):
  2. f = open(filename, "r")
  3. text = f.read()
  4. word = re.findall("(.*)Offer",text)
  5. splitter = re.compile('[\s]+')
  6. for n in word:
  7. temp = n;
  8. s = splitter.split(temp)
  9. print ("The DHCP Server IP is :"+s[3])
  10. print ("The IP allocated to me is :"+s[4])

but the problem isthis reads only till the end of that particular line.. now i want to read a line that is a constant number of lines below what i read through the above code. But the problem is the 2nd line i want to read doesnt have any siignificant indexes or anything to look for.... Is there someway I can do this?

Basically i want to read a block of text spread through a few lines between a KNOWN start point and END point (i've bolded the 2 points) and the Src Port and the Dest Port (in red) are what i want to extract


No.     Time        Source                Destination           Protocol Info
    140 3.050240    137.132.69.169        172.19.134.182        FTP      Response: 150 Opening BINARY mode data connection for /pub/ubuntu/dists/dapper-security/Contents-i386.gz (3376313 bytes).

Frame 140 (179 bytes on wire, 179 bytes captured)
Ethernet II, Src: Ditech_55:38:00 (00:d0:02:55:38:00), Dst: Usi_ac:fe:1e (00:16:41:ac:fe:1e)
Internet Protocol, Src: 137.132.69.169 (137.132.69.169), Dst: 172.19.134.182 (172.19.134.182)
Transmission Control Protocol, Src Port: ftp (21), Dst Port: 59785 (59785), Seq: 85, Ack: 178, Len: 113
File Transfer Protocol (FTP)

No.     Time        Source                Destination           Protocol Info
    141 3.051247    137.132.69.169        172.19.134.182        FTP-DATA FTP Data: 1368 bytes

Frame 141 (1434 bytes on wire, 1434 bytes captured)
Ethernet II, Src: Ditech_55:38:00 (00:d0:02:55:38:00), Dst: Usi_ac:fe:1e (00:16:41:ac:fe:1e)
Internet Protocol, Src: 137.132.69.169 (137.132.69.169), Dst: 172.19.134.182 (172.19.134.182)
Transmission Control Protocol, Src Port: 50003 (50003), Dst Port: 48115 (48115), Seq: 1, Ack: 1, Len: 1368
FTP Data
Last edited by mitsuevo; Nov 7th, 2009 at 9:58 am.
Reputation Points: 10
Solved Threads: 1
Newbie Poster
mitsuevo is offline Offline
21 posts
since Nov 2009
Nov 7th, 2009
0
Re: findall
Here is a code that finds the Src Port and Dst Port in your second example.
When you write regular expressions, always use raw strings prefixed with 'r', like r"my regex" . Also, the character '.' in a regex matches any character but the newline. If you want that '.' matches newline, you must use re.compile(r"my regex", re.DOTALL)
python Syntax (Toggle Plain Text)
  1. #!/usr/bin/env python
  2.  
  3. datafile = "chunk2.txt"
  4.  
  5. import re
  6.  
  7. head_re = re.compile(r"No[.]\s+Time\s+Source\s+Destination\s+Protocol Info\s+(\d+)")
  8.  
  9. def packets(filename):
  10. """Split the file in packets. Yield pairs (number, packet_content)"""
  11. text = open(filename).read()
  12. i = None
  13. num = None
  14. for match in head_re.finditer(text):
  15. if i is not None:
  16. yield (num, text[i:match.start()])
  17. i = match.start()
  18. num = int(match.group(1))
  19. if i is not None:
  20. yield (num, text[i:])
  21.  
  22. ports_re = re.compile(r"Src Port:([^,]*?),\s*?Dst Port:([^,]*)(?:,|$)")
  23.  
  24. def ports(packet):
  25. """Search the src port and dst port in a packet"""
  26. for match in ports_re.finditer(packet):
  27. yield (match.group(1), match.group(2))
  28.  
  29. if __name__ == "__main__":
  30. for num, p in packets(datafile):
  31. print num, list(ports(p))
Last edited by Gribouillis; Nov 7th, 2009 at 11:50 am.
Reputation Points: 930
Solved Threads: 668
Posting Maven
Gribouillis is offline Offline
2,656 posts
since Jul 2008
Nov 9th, 2009
0

Your first atempt was almost good...

... but you've made some mistakes with regex flags and pattern details. Also, your data contains only one DHCP Offer and one DHCP Request. If you need both IP's, you can use this:
Python Syntax (Toggle Plain Text)
  1. >>> print re.findall(r'DHCP\s+.*?Internet Protocol,\s+Src:\s*(.+?)\s*\(.*?Bootstrap', text, re.M|re.S)
  2. ['192.168.110.33', '0.0.0.0']
To get offer and request IP's individually, use:
Python Syntax (Toggle Plain Text)
  1. >>> print re.findall(r'DHCP Offer\s+.*?Internet Protocol,\s+Src:\s*(.+?)\s*\(.*?Bootstrap', text, re.M|re.S)
  2. ['192.168.110.33']
  3. >>> print re.findall(r'DHCP Request\s+.*?Internet Protocol,\s+Src:\s*(.+?)\s*\(.*?Bootstrap', text, re.M|re.S)
  4. ['0.0.0.0']
Reputation Points: 20
Solved Threads: 25
Junior Poster in Training
pythopian is offline Offline
81 posts
since Nov 2009
Nov 9th, 2009
0
Re: findall
hey guys!
I have another question kinda relating to my initial question and data packet structure.
I want to extract the packets where the protocol is either TCP or HTTP. How can i do that?
Right now i have this RE that I use, but it extracts ALL the info (which I want) from all the packets.
Python Syntax (Toggle Plain Text)
  1. datalines = re.findall("Protocol Info[\s]+(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)",text)

Thanks!
Last edited by mitsuevo; Nov 9th, 2009 at 3:54 pm.
Reputation Points: 10
Solved Threads: 1
Newbie Poster
mitsuevo is offline Offline
21 posts
since Nov 2009
Nov 10th, 2009
0
Re: findall
Python Syntax (Toggle Plain Text)
  1. datalines = re.findall("Protocol Info(.*)[HTTP,TCP](.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)",text)
THis is what i have right now, but its not giving me any results at all...
I basically want the to select all lines (and their respective 5 succeeding lines) nased on if the first line says either TCP or HTTP.
Reputation Points: 10
Solved Threads: 1
Newbie Poster
mitsuevo is offline Offline
21 posts
since Nov 2009

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Python Forum Timeline: Stopping an execution
Next Thread in Python Forum Timeline: creating a new EMPTY array





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC