| | |
Simple HTML Parsing Question
Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved |
•
•
Join Date: Sep 2008
Posts: 93
Reputation:
Solved Threads: 1
So, I have a rather simple question today. I'll try to explain it by using an example. Let's say there is a line of HTML in the page "www.mywebsite.com/py" that says "<tr colData0='Friday'>". However, this line of code can change to be "Sunday", "Tuesday", etc. What would be the easiest way for me to extract the data (Monday, Wednesday, etc.) from that line of code via a Python script?
Thanks in advance.
Thanks in advance.
0
#2 Oct 14th, 2009
have a look at Beautiful Soup:
I have heard that it is an excellent tool for scraping webpages.
If that doesn't work then you can always try using string methods...
Actually, the second idea would probably be the simplest
I have heard that it is an excellent tool for scraping webpages.
If that doesn't work then you can always try using string methods...
Python Syntax (Toggle Plain Text)
text = "<tr colData0='Friday'>" #Split into a list with 3 items. text = text.split("'") print text[1]
Actually, the second idea would probably be the simplest
Last edited by Paul Thompson; Oct 14th, 2009 at 12:16 am.
Make it idiot proof and someone will make a better idiot.
Check out my Site | and join us on IRC | Python Specific IRC
Check out my Site | and join us on IRC | Python Specific IRC
1
#3 Oct 14th, 2009
You could use HTMLParser like this
python Syntax (Toggle Plain Text)
import sys if sys.version_info[0] < 3: from HTMLParser import HTMLParser from urllib2 import urlopen else: from html.parser import HTMLParser from urllib.request import urlopen class MyParser(HTMLParser): def __init__(self): HTMLParser.__init__(self) self.day = None def handle_starttag(self, tag, attrs): if tag == 'tr': for key, value in attrs: if key == 'colData0': self.day = value def get_day(url): parser = MyParser() html = urlopen(url).read().decode('utf8') parser.feed(html) parser.close() return parser.day if __name__ == '__main__': print(get_day("http://www.mywebsite.com/py"))
•
•
Join Date: Sep 2008
Posts: 93
Reputation:
Solved Threads: 1
0
#4 Oct 14th, 2009
@Gribouillis: I tried the code you gave me, except I receive an error about an unexpected tag:
Since the source of the HTML page is too long to post, just view the page source via your browser: http://www.xfire.com/friends/soulmazer/.
@paulthom: Well, I would prefer to just parse it myself, as the example I gave is less confusing than what I am actually trying to accomplish (get information from a profile). Except, how could I get the unparsed HTML of a web page via a script?
•
•
•
•
HTMLParser.HTMLParseError: bad end tag: u"</SCR');\ndocument.write('IPT>"
...blah...
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
@paulthom: Well, I would prefer to just parse it myself, as the example I gave is less confusing than what I am actually trying to accomplish (get information from a profile). Except, how could I get the unparsed HTML of a web page via a script?
1
#5 Oct 15th, 2009
If by unparsed HTML via script you mean get the source code for a page. Then you do that by using urllib
Hope that is what you meant
Python Syntax (Toggle Plain Text)
import urllib #This is a file like object. data = urllib.urlopen("www.daniweb.com") #So we have to read() it to get the text print data.read()
Hope that is what you meant
Make it idiot proof and someone will make a better idiot.
Check out my Site | and join us on IRC | Python Specific IRC
Check out my Site | and join us on IRC | Python Specific IRC
![]() |
Similar Threads
- Very simple regular expression question .? (Python)
- simple html form driving me nuts (HTML and CSS)
- Simple HTML Question (HTML and CSS)
- HTML parsing by perl (Perl)
- HTML parsing (Java)
- simple solution? Parsing question (C++)
- Writing a coupon on HTML language (HTML and CSS)
Other Threads in the Python Forum
- Previous Thread: need some list help
- Next Thread: wxPython and Sqlite3 database problem
| Thread Tools | Search this Thread |
Tag cloud for Python
abrupt ansi anti apache approximation array basic beginner book builtin calculator chmod code converter countpasswordentry curved dan08 dictionaries dictionary dynamic examples excel file filename float format ftp function gui heads homework import inches input java launcher library line lines linux list lists loop mouse mysql mysqlquery number numbers numeric output parsing path phonebook plugin port prime programming progressbar projects py2exe pygame pyqt pysimplewizard python random recursion recursive redirect refresh scrolledtext server software ssh stamp statictext statistics string strings table terminal text textarea thread threading time tkinter tlapse trick tricks tuple tutorial twoup ubuntu unicode urllib urllib2 variable windows wordgame wxpython





