findall

Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
Reply

Join Date: Nov 2009
Posts: 18
Reputation: mitsuevo is an unknown quantity at this point 
Solved Threads: 1
mitsuevo mitsuevo is offline Offline
Newbie Poster

findall

 
0
  #1
29 Days Ago
hey guys,
i have a problem.. i have this chunk of text like this.....

  1. No. Time Source Destination Protocol Info
  2. 2 0.005318 192.168.110.33 192.168.110.44 ICMP Echo (ping) request
  3.  
  4. Frame 2 (98 bytes on wire, 98 bytes captured)
  5. Ethernet II, Src: Cisco-Li_4d:e1:30 (00:1c:10:4d:e1:30), Dst: IntelCor_4d:77:83 (00:13:02:4d:77:83)
  6. Internet Protocol, Src: 192.168.110.33 (192.168.110.33), Dst: 192.168.110.44 (192.168.110.44)
  7. Internet Control Message Protocol
  8.  
  9. No. Time Source Destination Protocol Info
  10. 3 0.998730 192.168.110.33 192.168.110.44 DHCP DHCP Offer - Transaction ID 0x9e0e832
  11.  
  12. Frame 3 (347 bytes on wire, 347 bytes captured)
  13. Ethernet II, Src: Cisco-Li_4d:e1:30 (00:1c:10:4d:e1:30), Dst: IntelCor_4d:77:83 (00:13:02:4d:77:83)
  14. Internet Protocol, Src: 192.168.110.33 (192.168.110.33), Dst: 192.168.110.44 (192.168.110.44)
  15. User Datagram Protocol, Src Port: bootps (67), Dst Port: bootpc (68)
  16. Bootstrap Protocol
  17.  
  18. No. Time Source Destination Protocol Info
  19. 4 0.998917 0.0.0.0 255.255.255.255 DHCP DHCP Request - Transaction ID 0x9e0e832
  20.  
  21. Frame 4 (348 bytes on wire, 348 bytes captured)
  22. Ethernet II, Src: IntelCor_4d:77:83 (00:13:02:4d:77:83), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
  23. Internet Protocol, Src: 0.0.0.0 (0.0.0.0), Dst: 255.255.255.255 (255.255.255.255)
  24. User Datagram Protocol, Src Port: bootpc (68), Dst Port: bootps (67)
  25. Bootstrap Protocol

basically i'm trying to extract the DHCP server and the the IP address its giving me... sp i want to look for the text between "DHCP Offer" and "bootstrap" and then from that chunk get the part between "Internet Protocol) Src:" and "("

  1. temp = re.findall("DHCP\s+Offer(.)Bootstrap", text)
  2. print (temp)
  3. name=re.findall("(Internet Protocol)\sSrc:.[(]", temp)
  4. print name

but i think its not reading past the 1st line between the srearch words
Last edited by mitsuevo; 29 Days Ago at 12:53 am.
Reply With Quote Quick reply to this message  
Join Date: Jul 2008
Posts: 939
Reputation: Gribouillis is a jewel in the rough Gribouillis is a jewel in the rough Gribouillis is a jewel in the rough 
Solved Threads: 217
Gribouillis's Avatar
Gribouillis Gribouillis is online now Online
Posting Shark
 
0
  #2
29 Days Ago
You can use this code snippet.
Reply With Quote Quick reply to this message  
Join Date: Nov 2009
Posts: 18
Reputation: mitsuevo is an unknown quantity at this point 
Solved Threads: 1
mitsuevo mitsuevo is offline Offline
Newbie Poster
 
0
  #3
29 Days Ago
Thanks! that helps.. i also found another solution that is a bit simpler...
i do something like this

  1. def findDHCP(filename):
  2. f = open(filename, "r")
  3. text = f.read()
  4. word = re.findall("(.*)Offer",text)
  5. splitter = re.compile('[\s]+')
  6. for n in word:
  7. temp = n;
  8. s = splitter.split(temp)
  9. print ("The DHCP Server IP is :"+s[3])
  10. print ("The IP allocated to me is :"+s[4])

but the problem isthis reads only till the end of that particular line.. now i want to read a line that is a constant number of lines below what i read through the above code. But the problem is the 2nd line i want to read doesnt have any siignificant indexes or anything to look for.... Is there someway I can do this?

Basically i want to read a block of text spread through a few lines between a KNOWN start point and END point (i've bolded the 2 points) and the Src Port and the Dest Port (in red) are what i want to extract


No.     Time        Source                Destination           Protocol Info
    140 3.050240    137.132.69.169        172.19.134.182        FTP      Response: 150 Opening BINARY mode data connection for /pub/ubuntu/dists/dapper-security/Contents-i386.gz (3376313 bytes).

Frame 140 (179 bytes on wire, 179 bytes captured)
Ethernet II, Src: Ditech_55:38:00 (00:d0:02:55:38:00), Dst: Usi_ac:fe:1e (00:16:41:ac:fe:1e)
Internet Protocol, Src: 137.132.69.169 (137.132.69.169), Dst: 172.19.134.182 (172.19.134.182)
Transmission Control Protocol, Src Port: ftp (21), Dst Port: 59785 (59785), Seq: 85, Ack: 178, Len: 113
File Transfer Protocol (FTP)

No.     Time        Source                Destination           Protocol Info
    141 3.051247    137.132.69.169        172.19.134.182        FTP-DATA FTP Data: 1368 bytes

Frame 141 (1434 bytes on wire, 1434 bytes captured)
Ethernet II, Src: Ditech_55:38:00 (00:d0:02:55:38:00), Dst: Usi_ac:fe:1e (00:16:41:ac:fe:1e)
Internet Protocol, Src: 137.132.69.169 (137.132.69.169), Dst: 172.19.134.182 (172.19.134.182)
Transmission Control Protocol, Src Port: 50003 (50003), Dst Port: 48115 (48115), Seq: 1, Ack: 1, Len: 1368
FTP Data
Last edited by mitsuevo; 29 Days Ago at 9:58 am.
Reply With Quote Quick reply to this message  
Join Date: Jul 2008
Posts: 939
Reputation: Gribouillis is a jewel in the rough Gribouillis is a jewel in the rough Gribouillis is a jewel in the rough 
Solved Threads: 217
Gribouillis's Avatar
Gribouillis Gribouillis is online now Online
Posting Shark
 
0
  #4
29 Days Ago
Here is a code that finds the Src Port and Dst Port in your second example.
When you write regular expressions, always use raw strings prefixed with 'r', like r"my regex" . Also, the character '.' in a regex matches any character but the newline. If you want that '.' matches newline, you must use re.compile(r"my regex", re.DOTALL)
  1. #!/usr/bin/env python
  2.  
  3. datafile = "chunk2.txt"
  4.  
  5. import re
  6.  
  7. head_re = re.compile(r"No[.]\s+Time\s+Source\s+Destination\s+Protocol Info\s+(\d+)")
  8.  
  9. def packets(filename):
  10. """Split the file in packets. Yield pairs (number, packet_content)"""
  11. text = open(filename).read()
  12. i = None
  13. num = None
  14. for match in head_re.finditer(text):
  15. if i is not None:
  16. yield (num, text[i:match.start()])
  17. i = match.start()
  18. num = int(match.group(1))
  19. if i is not None:
  20. yield (num, text[i:])
  21.  
  22. ports_re = re.compile(r"Src Port:([^,]*?),\s*?Dst Port:([^,]*)(?:,|$)")
  23.  
  24. def ports(packet):
  25. """Search the src port and dst port in a packet"""
  26. for match in ports_re.finditer(packet):
  27. yield (match.group(1), match.group(2))
  28.  
  29. if __name__ == "__main__":
  30. for num, p in packets(datafile):
  31. print num, list(ports(p))
Last edited by Gribouillis; 29 Days Ago at 11:50 am.
Reply With Quote Quick reply to this message  
Join Date: Nov 2009
Posts: 79
Reputation: pythopian is an unknown quantity at this point 
Solved Threads: 21
pythopian pythopian is offline Offline
Junior Poster in Training

Your first atempt was almost good...

 
0
  #5
27 Days Ago
... but you've made some mistakes with regex flags and pattern details. Also, your data contains only one DHCP Offer and one DHCP Request. If you need both IP's, you can use this:
  1. >>> print re.findall(r'DHCP\s+.*?Internet Protocol,\s+Src:\s*(.+?)\s*\(.*?Bootstrap', text, re.M|re.S)
  2. ['192.168.110.33', '0.0.0.0']
To get offer and request IP's individually, use:
  1. >>> print re.findall(r'DHCP Offer\s+.*?Internet Protocol,\s+Src:\s*(.+?)\s*\(.*?Bootstrap', text, re.M|re.S)
  2. ['192.168.110.33']
  3. >>> print re.findall(r'DHCP Request\s+.*?Internet Protocol,\s+Src:\s*(.+?)\s*\(.*?Bootstrap', text, re.M|re.S)
  4. ['0.0.0.0']
Reply With Quote Quick reply to this message  
Join Date: Nov 2009
Posts: 18
Reputation: mitsuevo is an unknown quantity at this point 
Solved Threads: 1
mitsuevo mitsuevo is offline Offline
Newbie Poster
 
0
  #6
26 Days Ago
hey guys!
I have another question kinda relating to my initial question and data packet structure.
I want to extract the packets where the protocol is either TCP or HTTP. How can i do that?
Right now i have this RE that I use, but it extracts ALL the info (which I want) from all the packets.
  1. datalines = re.findall("Protocol Info[\s]+(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)",text)

Thanks!
Last edited by mitsuevo; 26 Days Ago at 3:54 pm.
Reply With Quote Quick reply to this message  
Join Date: Nov 2009
Posts: 18
Reputation: mitsuevo is an unknown quantity at this point 
Solved Threads: 1
mitsuevo mitsuevo is offline Offline
Newbie Poster
 
0
  #7
26 Days Ago
  1. datalines = re.findall("Protocol Info(.*)[HTTP,TCP](.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)",text)
THis is what i have right now, but its not giving me any results at all...
I basically want the to select all lines (and their respective 5 succeeding lines) nased on if the first line says either TCP or HTTP.
Reply With Quote Quick reply to this message  
Reply

Message:



Similar Threads
Other Threads in the Python Forum
Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC