For the example line you gave
def find_attribute_value(html_tag, att):
## is att in html?
if att in html_tag:
## find the location of att
idx = html_tag.find(att)
##split on the quotation marks for everything after att
first_split = html_tag[idx:].split('"')
print first_split[1]
else:
print "attribute %s Not Found" % (att)
def find_attribute_value2(html_tag, att):
""" this is the way you were trying to do it
"""
first_split = html_tag.split()
for x in first_split:
if att in x:
second_split = x.split("=")
fname=second_split[1].replace('"', "")
print fname
return
test_line = find_attribute_value('<img align=top src="photos/horton.JPG" alt="Image of StG instructor (Diane Horton)">', "src")
test_line = find_attribute_value2('<img align=top src="photos/horton.JPG" alt="Image of StG instructor (Diane Horton)">', "src")
and note that this will only find the first instance of "att". And you can split on "=" as a second split(), if you know there is always space after the file name.
woooee
Nearly a Posting Maven
2,454 posts since Dec 2006
Reputation Points: 777
Solved Threads: 714
Now do python have strong parser like lxml and BeautifulSoup,that do job like this much easier.
>>> from BeautifulSoup import BeautifulSoup
>>> html = '''<img align=top src="photos/horton.JPG" alt="Image of StG instructor (Diane Horton)">', "src"'''
>>> soup = BeautifulSoup(html)
>>> tag = soup.find('img')
>>> tag['src']
u'photos/horton.JPG'
snippsat
Practically a Posting Shark
808 posts since Aug 2008
Reputation Points: 353
Solved Threads: 294
One with regex you can look at,this is also not an ideal way when it comes to html.
import re
def find_attribute_value(html, att):
s = re.search(r'%s="(.*?)"' % att, html)
return s.group(1)
html = '''<img align=top src="photos/horton.JPG" alt="Image of StG instructor (Diane Horton)">', "src"'''
print find_attribute_value(html, 'src')
#photos/horton.JPG
print find_attribute_value(html, 'alt')
#Image of StG instructor (Diane Horton)
snippsat
Practically a Posting Shark
808 posts since Aug 2008
Reputation Points: 353
Solved Threads: 294
You did not say anything about also finding "align=top". To do that, check if the string starts with a quotation mark, in which case the first function works fine. If no quotation mark is found, then split on white space. You will have to code some of this yourself instead of giving us one task after another until the program is written for you. This forum is for helping those with code, so if you post code then we will help.
woooee
Nearly a Posting Maven
2,454 posts since Dec 2006
Reputation Points: 777
Solved Threads: 714