954,525 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

urllib2 problem

Hi, I have this code:

import urllib2 as url
import webbrowser

def extract(text, sub1, sub2):
    """
    extract a substring from text between first
    occurances of substrings sub1 and sub2
    """
    return text.split(sub1, 1)[-1].split(sub2, 1)[0]
start="http://xkcd.com/"
permlist=[]
textlist=[]
for i in range(1, 638):
    temp=start+str(i)
    permlist.append(str(url.urlopen(temp).readlines()[88]))
    textlist.append(str(url.urlopen(temp).readlines()[77]))

for i in permlist:
    i = extract(i, '<h3>Permanent link to this comic: ', '</h3>')

for i in textlist:
    i = extract(i, '<img src="http://imgs.xkcd.com/comics/scribblenauts.png" title="', '"')


print zip(permlist, textlist)


and whenever I run it, it raises this error:

Traceback (most recent call last):
  File "C:/Python26/test.py", line 15, in <module>
    permlist.append(str(url.urlopen(temp).readlines()[88]))
  File "C:\Python26\lib\urllib2.py", line 124, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python26\lib\urllib2.py", line 389, in open
    response = meth(req, response)
  File "C:\Python26\lib\urllib2.py", line 502, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python26\lib\urllib2.py", line 427, in error
    return self._call_chain(*args)
  File "C:\Python26\lib\urllib2.py", line 361, in _call_chain
    result = func(*args)
  File "C:\Python26\lib\urllib2.py", line 510, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 404: Not Found


What is the problem, but mainly what can I do to fix it?

thanks in advance

leegeorg07
Posting Pro in Training
428 posts since Jul 2008
Reputation Points: 35
Solved Threads: 32
 

Looks like one of the 638 web pages is not available. You should use a try/except trap for this case.

sneekula
Nearly a Posting Maven
2,427 posts since Oct 2006
Reputation Points: 961
Solved Threads: 212
 

so what could I use?

sorry, at the moment I just want a quick fix and will figure out the best way when I have time

leegeorg07
Posting Pro in Training
428 posts since Jul 2008
Reputation Points: 35
Solved Threads: 32
 
for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))
        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except Error: # catch any exception and continue the for loop
        print "Error at index %d."%i
djidjadji
Light Poster
28 posts since Aug 2009
Reputation Points: 38
Solved Threads: 18
 

Yeah you'll need to use exceptions, but if you want the script to continue after the error you're going to have to "pass" it, try this:

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except Error, err:
        print "Index Error: %d at %d" % (err, i)
        pass


this will not only print the error and the location of the error but will also pass to keep the loop going.

ov3rcl0ck
Junior Poster
113 posts since Sep 2009
Reputation Points: 35
Solved Threads: 22
 

hey again, they are good ideas but whenever I try to run it again it says:

Traceback (most recent call last):
  File "C:\Python26\test.py", line 18, in <module>
    except Error, err:
NameError: name 'Error' is not defined
leegeorg07
Posting Pro in Training
428 posts since Jul 2008
Reputation Points: 35
Solved Threads: 32
 

Since you don't know the specific error class, simply use ...

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))
        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except: # catch any exception and continue the for loop
        print "Error at index %d."%i
        pass
vegaseat
DaniWeb's Hypocrite
Moderator
5,989 posts since Oct 2004
Reputation Points: 1,345
Solved Threads: 1,417
 

ok thanks, trying it now, So that I can do better handling soon, how can I find the class?

leegeorg07
Posting Pro in Training
428 posts since Jul 2008
Reputation Points: 35
Solved Threads: 32
 

Well you found it in your first post ...
HTTPError

vegaseat
DaniWeb's Hypocrite
Moderator
5,989 posts since Oct 2004
Reputation Points: 1,345
Solved Threads: 1,417
 

Oh ok thanks, whenever I run the zip part it uses the original text, not what I changed it to, I tried:

for i, j in permlist, textlist:
  print i, ':', j

but it says that it is out of range, what can I do? I have googled it to no avail :(

leegeorg07
Posting Pro in Training
428 posts since Jul 2008
Reputation Points: 35
Solved Threads: 32
 

Since you don't know the specific error class, simply use ...

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))
        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except: # catch any exception and continue the for loop
        print "Error at index %d."%i
        pass

My bad i used the wrong exception, vegaseat is right.

ov3rcl0ck
Junior Poster
113 posts since Sep 2009
Reputation Points: 35
Solved Threads: 22
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You