I'm having this problem with passing a variable.
Basically i'm trying to find every url in a HTML document. I'm doing this using the HTMLParser class (i made an extension of it).
This is the code:
class Parser(HTMLParser.HTMLParser): def __init__(self, url): HTMLParser.HTMLParser.__init__(self) req = urllib2.urlopen(url) self.feed(req.read()) def handle_starttag(self, tag,attrs): if tag.lower() == 'a' and attrs: for attribute in attrs: if 'href' in attribute : link = urlparse.urljoin(base,attribute) print link
Everything works fine, but in the urlparse it goes wrong. I want to find out the base url, which is the url passed through in __init__ .
I can't hardcode it, since the url will change. I also can't pass it as an argument since than the superclass won't recognise that function anymore...
So i'm lost on how i can get the argument "url" from __init__ in my function: handle_starrtag.
(PS. i can't use Beautifulsoup or lxml or anything like that, and yes it's killing me)
I managed to make this work :
- make a variable 'base' in the class Parser
- in the __init__ you can type: Parser.base = url
- in handletag you can then say: Parser.base