I'm having this problem with passing a variable.

Basically i'm trying to find every url in a HTML document. I'm doing this using the HTMLParser class (i made an extension of it).

This is the code:

class Parser(HTMLParser.HTMLParser):

  def __init__(self, url):
    HTMLParser.HTMLParser.__init__(self)
    req = urllib2.urlopen(url)
    self.feed(req.read())

  def handle_starttag(self, tag,attrs):
    if tag.lower() == 'a' and attrs:
      for attribute in attrs:
        if 'href' in attribute :
          link = urlparse.urljoin(base,attribute[1])
          print link

Everything works fine, but in the urlparse it goes wrong. I want to find out the base url, which is the url passed through in __init__ .
I can't hardcode it, since the url will change. I also can't pass it as an argument since than the superclass won't recognise that function anymore...
So i'm lost on how i can get the argument "url" from __init__ in my function: handle_starrtag.

(PS. i can't use Beautifulsoup or lxml or anything like that, and yes it's killing me)


EDIT:

I managed to make this work :

- make a variable 'base' in the class Parser
- in the __init__ you can type: Parser.base = url
- in handletag you can then say: Parser.base

Edited 6 Years Ago by Aeronobe: n/a

(this post can be deleted)

Edited 6 Years Ago by Aeronobe: n/a

That is one value for all class, maybe you would be better of by only putting

self.url=url

in your __init__?

Nice coding with inheritance, good job!

Aah i see, it should indeed be better the way you say :)

Thanks !

This article has been dead for over six months. Start a new discussion instead.