Ok, so I am trying to write a code that will find the attributes in a html tag using lists and splitting them up. Unfortunately its giving me a hard time with the added quotation marks.

def find_attribute_value(html_tag, att):
    '''Return the value of attribute att (a str) in the str html_tag.  
    Return None if att doesn't occur in html_tag.
    '''
    
    words = html_tag.split(" ")
#this is me trying to split up the initial html tag.

    omgs= str (words)
# then trying to convert it into a string because it wouldn't let me split it up further otherwise :(

    list2 = omgssss.split()    
# then I split it up again

    for word in list2:
        second_split = word.split("=")
#and again because I needed to separate the attributes from the equal sign.


#this is what I want to do with my code, but I can't implement it because my code looks like it was written by a 4 year old. I want to be able to recognize the attributed value and call the value. 
        if att == item in list:
            print list[att]
        print second_split

my example that I was testing it out on was this html tag:

find_attribute_value('<img align=top src="photos/horton.JPG" alt="Image of StG instructor (Diane Horton)">', "src")

and i'm trying to find the att being src
which would bring up "photos/horton.JPG"

but obviously it doesn't

Recommended Answers

All 9 Replies

For the example line you gave

def find_attribute_value(html_tag, att):
    ## is att in html?
    if att in html_tag:
        ## find the location of att
        idx = html_tag.find(att)
        ##split on the quotation marks for everything after att
        first_split = html_tag[idx:].split('"')
        print first_split[1]
    else:
        print "attribute %s Not Found" % (att)

def find_attribute_value2(html_tag, att):
    """ this is the way you were trying to do it
    """
    first_split = html_tag.split()
    for x in first_split:
        if att in x:
            second_split = x.split("=")
            fname=second_split[1].replace('"', "")
            print fname
            return

test_line = find_attribute_value('<img align=top src="photos/horton.JPG" alt="Image of StG instructor (Diane Horton)">', "src")

test_line = find_attribute_value2('<img align=top src="photos/horton.JPG" alt="Image of StG instructor (Diane Horton)">', "src")

and note that this will only find the first instance of "att". And you can split on "=" as a second split(), if you know there is always space after the file name.

commented: YOU MADE IT WORK! :O Genious! +1

Thank you so much, your way is so elegant, not to mention it works!

However, a tiny problem, that I still can't understand. If the attribute is in quotation marks and has a space after it, is there a way to get the entire quote? or would that not be possible with a small program?

ex would be alt="Image of StG instructor (Diane Horton)"
the att = alt
and the program should spit out "Image of stg instructor (Diane Horton)"

This problem is solved in the first function because it splits it according to the quotation marks. But then you can't call att= align because it gives
>>> photos/horton.JPG
instead of
>>> top


Thank you so much for your help already! And such a speedy reply<3

Now do python have strong parser like lxml and BeautifulSoup,that do job like this much easier.

>>> from BeautifulSoup import BeautifulSoup
>>> html = '''<img align=top src="photos/horton.JPG" alt="Image of StG instructor (Diane Horton)">', "src"'''
>>> soup = BeautifulSoup(html)
>>> tag = soup.find('img')
>>> tag['src']
u'photos/horton.JPG'

Now do python have strong parser like lxml and BeautifulSoup,that do job like this much easier.

>>> from BeautifulSoup import BeautifulSoup
>>> html = '''<img align=top src="photos/horton.JPG" alt="Image of StG instructor (Diane Horton)">', "src"'''
>>> soup = BeautifulSoup(html)
>>> tag = soup.find('img')
>>> tag['src']
u'photos/horton.JPG'

I know but this is just using basic python lol. Otherwise wouldn't life be easier! hahah, thanks anyways <33

One with regex you can look at,this is also not an ideal way when it comes to html.

import re

def find_attribute_value(html, att):
    s = re.search(r'%s="(.*?)"' % att, html)
    return s.group(1)

html = '''<img align=top src="photos/horton.JPG" alt="Image of StG instructor (Diane Horton)">', "src"'''
print find_attribute_value(html, 'src')
#photos/horton.JPG

print find_attribute_value(html, 'alt')
#Image of StG instructor (Diane Horton)

Thanks, I just wanted to know if it was possible. Thank you so much for your answers guys!

You did not say anything about also finding "align=top". To do that, check if the string starts with a quotation mark, in which case the first function works fine. If no quotation mark is found, then split on white space. You will have to code some of this yourself instead of giving us one task after another until the program is written for you. This forum is for helping those with code, so if you post code then we will help.

from HTMLParser import HTMLParser
class parser(HTMLParser):
    def handle_starttag(self,tag,attr):
        #attrs is adictionary
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.