| | |
findall
Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
![]() |
•
•
Join Date: Nov 2009
Posts: 18
Reputation:
Solved Threads: 1
hey guys,
i have a problem.. i have this chunk of text like this.....
basically i'm trying to extract the DHCP server and the the IP address its giving me... sp i want to look for the text between "DHCP Offer" and "bootstrap" and then from that chunk get the part between "Internet Protocol) Src:" and "("
but i think its not reading past the 1st line between the srearch words
i have a problem.. i have this chunk of text like this.....
Python Syntax (Toggle Plain Text)
No. Time Source Destination Protocol Info 2 0.005318 192.168.110.33 192.168.110.44 ICMP Echo (ping) request Frame 2 (98 bytes on wire, 98 bytes captured) Ethernet II, Src: Cisco-Li_4d:e1:30 (00:1c:10:4d:e1:30), Dst: IntelCor_4d:77:83 (00:13:02:4d:77:83) Internet Protocol, Src: 192.168.110.33 (192.168.110.33), Dst: 192.168.110.44 (192.168.110.44) Internet Control Message Protocol No. Time Source Destination Protocol Info 3 0.998730 192.168.110.33 192.168.110.44 DHCP DHCP Offer - Transaction ID 0x9e0e832 Frame 3 (347 bytes on wire, 347 bytes captured) Ethernet II, Src: Cisco-Li_4d:e1:30 (00:1c:10:4d:e1:30), Dst: IntelCor_4d:77:83 (00:13:02:4d:77:83) Internet Protocol, Src: 192.168.110.33 (192.168.110.33), Dst: 192.168.110.44 (192.168.110.44) User Datagram Protocol, Src Port: bootps (67), Dst Port: bootpc (68) Bootstrap Protocol No. Time Source Destination Protocol Info 4 0.998917 0.0.0.0 255.255.255.255 DHCP DHCP Request - Transaction ID 0x9e0e832 Frame 4 (348 bytes on wire, 348 bytes captured) Ethernet II, Src: IntelCor_4d:77:83 (00:13:02:4d:77:83), Dst: Broadcast (ff:ff:ff:ff:ff:ff) Internet Protocol, Src: 0.0.0.0 (0.0.0.0), Dst: 255.255.255.255 (255.255.255.255) User Datagram Protocol, Src Port: bootpc (68), Dst Port: bootps (67) Bootstrap Protocol
basically i'm trying to extract the DHCP server and the the IP address its giving me... sp i want to look for the text between "DHCP Offer" and "bootstrap" and then from that chunk get the part between "Internet Protocol) Src:" and "("
Python Syntax (Toggle Plain Text)
temp = re.findall("DHCP\s+Offer(.)Bootstrap", text) print (temp) name=re.findall("(Internet Protocol)\sSrc:.[(]", temp) print name
but i think its not reading past the 1st line between the srearch words
Last edited by mitsuevo; 29 Days Ago at 12:53 am.
•
•
Join Date: Nov 2009
Posts: 18
Reputation:
Solved Threads: 1
0
#3 29 Days Ago
Thanks! that helps.. i also found another solution that is a bit simpler...
i do something like this
but the problem isthis reads only till the end of that particular line.. now i want to read a line that is a constant number of lines below what i read through the above code. But the problem is the 2nd line i want to read doesnt have any siignificant indexes or anything to look for.... Is there someway I can do this?
Basically i want to read a block of text spread through a few lines between a KNOWN start point and END point (i've bolded the 2 points) and the Src Port and the Dest Port (in red) are what i want to extract
i do something like this
Python Syntax (Toggle Plain Text)
def findDHCP(filename): f = open(filename, "r") text = f.read() word = re.findall("(.*)Offer",text) splitter = re.compile('[\s]+') for n in word: temp = n; s = splitter.split(temp) print ("The DHCP Server IP is :"+s[3]) print ("The IP allocated to me is :"+s[4])
but the problem isthis reads only till the end of that particular line.. now i want to read a line that is a constant number of lines below what i read through the above code. But the problem is the 2nd line i want to read doesnt have any siignificant indexes or anything to look for.... Is there someway I can do this?
Basically i want to read a block of text spread through a few lines between a KNOWN start point and END point (i've bolded the 2 points) and the Src Port and the Dest Port (in red) are what i want to extract
No. Time Source Destination Protocol Info
140 3.050240 137.132.69.169 172.19.134.182 FTP Response: 150 Opening BINARY mode data connection for /pub/ubuntu/dists/dapper-security/Contents-i386.gz (3376313 bytes).
Frame 140 (179 bytes on wire, 179 bytes captured)
Ethernet II, Src: Ditech_55:38:00 (00:d0:02:55:38:00), Dst: Usi_ac:fe:1e (00:16:41:ac:fe:1e)
Internet Protocol, Src: 137.132.69.169 (137.132.69.169), Dst: 172.19.134.182 (172.19.134.182)
Transmission Control Protocol, Src Port: ftp (21), Dst Port: 59785 (59785), Seq: 85, Ack: 178, Len: 113
File Transfer Protocol (FTP)
No. Time Source Destination Protocol Info
141 3.051247 137.132.69.169 172.19.134.182 FTP-DATA FTP Data: 1368 bytes
Frame 141 (1434 bytes on wire, 1434 bytes captured)
Ethernet II, Src: Ditech_55:38:00 (00:d0:02:55:38:00), Dst: Usi_ac:fe:1e (00:16:41:ac:fe:1e)
Internet Protocol, Src: 137.132.69.169 (137.132.69.169), Dst: 172.19.134.182 (172.19.134.182)
Transmission Control Protocol, Src Port: 50003 (50003), Dst Port: 48115 (48115), Seq: 1, Ack: 1, Len: 1368
FTP Data Last edited by mitsuevo; 29 Days Ago at 9:58 am.
0
#4 29 Days Ago
Here is a code that finds the Src Port and Dst Port in your second example.
When you write regular expressions, always use raw strings prefixed with 'r', like
When you write regular expressions, always use raw strings prefixed with 'r', like
r"my regex" . Also, the character '.' in a regex matches any character but the newline. If you want that '.' matches newline, you must use re.compile(r"my regex", re.DOTALL) python Syntax (Toggle Plain Text)
#!/usr/bin/env python datafile = "chunk2.txt" import re head_re = re.compile(r"No[.]\s+Time\s+Source\s+Destination\s+Protocol Info\s+(\d+)") def packets(filename): """Split the file in packets. Yield pairs (number, packet_content)""" text = open(filename).read() i = None num = None for match in head_re.finditer(text): if i is not None: yield (num, text[i:match.start()]) i = match.start() num = int(match.group(1)) if i is not None: yield (num, text[i:]) ports_re = re.compile(r"Src Port:([^,]*?),\s*?Dst Port:([^,]*)(?:,|$)") def ports(packet): """Search the src port and dst port in a packet""" for match in ports_re.finditer(packet): yield (match.group(1), match.group(2)) if __name__ == "__main__": for num, p in packets(datafile): print num, list(ports(p))
Last edited by Gribouillis; 29 Days Ago at 11:50 am.
•
•
Join Date: Nov 2009
Posts: 79
Reputation:
Solved Threads: 21
... but you've made some mistakes with regex flags and pattern details. Also, your data contains only one DHCP Offer and one DHCP Request. If you need both IP's, you can use this:
To get offer and request IP's individually, use:
Python Syntax (Toggle Plain Text)
>>> print re.findall(r'DHCP\s+.*?Internet Protocol,\s+Src:\s*(.+?)\s*\(.*?Bootstrap', text, re.M|re.S) ['192.168.110.33', '0.0.0.0']
Python Syntax (Toggle Plain Text)
>>> print re.findall(r'DHCP Offer\s+.*?Internet Protocol,\s+Src:\s*(.+?)\s*\(.*?Bootstrap', text, re.M|re.S) ['192.168.110.33'] >>> print re.findall(r'DHCP Request\s+.*?Internet Protocol,\s+Src:\s*(.+?)\s*\(.*?Bootstrap', text, re.M|re.S) ['0.0.0.0']
•
•
Join Date: Nov 2009
Posts: 18
Reputation:
Solved Threads: 1
0
#6 26 Days Ago
hey guys!
I have another question kinda relating to my initial question and data packet structure.
I want to extract the packets where the protocol is either TCP or HTTP. How can i do that?
Right now i have this RE that I use, but it extracts ALL the info (which I want) from all the packets.
Thanks!
I have another question kinda relating to my initial question and data packet structure.
I want to extract the packets where the protocol is either TCP or HTTP. How can i do that?
Right now i have this RE that I use, but it extracts ALL the info (which I want) from all the packets.
Python Syntax (Toggle Plain Text)
datalines = re.findall("Protocol Info[\s]+(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)",text)
Thanks!
Last edited by mitsuevo; 26 Days Ago at 3:54 pm.
•
•
Join Date: Nov 2009
Posts: 18
Reputation:
Solved Threads: 1
0
#7 26 Days Ago
Python Syntax (Toggle Plain Text)
datalines = re.findall("Protocol Info(.*)[HTTP,TCP](.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)",text)
I basically want the to select all lines (and their respective 5 succeeding lines) nased on if the first line says either TCP or HTTP.
![]() |
Similar Threads
- sourcing a python config file? (Python)
- try to access Active Directory in .NET -->system.runtime.interopservices.comexception (VB.NET)
- hijacked to wqzaa.dll CWS variant? (Viruses, Spyware and other Nasties)
- Help - This started out as 213.159.117.132/redir.php NOW??? (Viruses, Spyware and other Nasties)
- Possible CWS problem (Viruses, Spyware and other Nasties)
- Cannot Remove About:Blank Homepage (Viruses, Spyware and other Nasties)
Other Threads in the Python Forum
- Previous Thread: Stopping an execution
- Next Thread: creating a new EMPTY array
| Thread Tools | Search this Thread |
accessdenied apache application argv array beginner book builtin change chmod converter countpasswordentry curved dan08 dictionary dynamic edit enter examples file filename float format function gui homework import inches input java keyboard lapse library line lines linux list lists loop microphone mouse movingimageswithpygame mysql mysqlquery newb number numbers numeric output parameters parsing path phonebook plugin port prime programming projects py2exe pygame pyopengl pysimplewizard python random recursion redirect remote reverse scrolledtext session simple smtp software sprite statictext string strings syntax table tennis terminal text textarea thread threading time tkinter tlapse trick tuple tutorial ubuntu unicode unit urllib urllib2 variable windows wordgame wxpython





