have a look at Beautiful Soup :
I have heard that it is an excellent tool for scraping webpages.
If that doesn't work then you can always try using string methods...
text = "<tr colData0='Friday'>"
#Split into a list with 3 items.
text = text.split("'")
print text[1]
Actually, the second idea would probably be the simplest :P
Paul Thompson
Veteran Poster
1,119 posts since May 2008
Reputation Points: 264
Solved Threads: 183
You could use HTMLParser like this
import sys
if sys.version_info[0] < 3:
from HTMLParser import HTMLParser
from urllib2 import urlopen
else:
from html.parser import HTMLParser
from urllib.request import urlopen
class MyParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.day = None
def handle_starttag(self, tag, attrs):
if tag == 'tr':
for key, value in attrs:
if key == 'colData0':
self.day = value
def get_day(url):
parser = MyParser()
html = urlopen(url).read().decode('utf8')
parser.feed(html)
parser.close()
return parser.day
if __name__ == '__main__':
print(get_day("http://www.mywebsite.com/py"))
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
If by unparsed HTML via script you mean get the source code for a page. Then you do that by using urllib
import urllib
#This is a file like object.
data = urllib.urlopen("www.daniweb.com")
#So we have to read() it to get the text
print data.read()
Hope that is what you meant :P
Paul Thompson
Veteran Poster
1,119 posts since May 2008
Reputation Points: 264
Solved Threads: 183