0

Hi, I have this code:

import urllib2 as url
import webbrowser

def extract(text, sub1, sub2):
    """
    extract a substring from text between first
    occurances of substrings sub1 and sub2
    """
    return text.split(sub1, 1)[-1].split(sub2, 1)[0]
start="http://xkcd.com/"
permlist=[]
textlist=[]
for i in range(1, 638):
    temp=start+str(i)
    permlist.append(str(url.urlopen(temp).readlines()[88]))
    textlist.append(str(url.urlopen(temp).readlines()[77]))

for i in permlist:
    i = extract(i, '<h3>Permanent link to this comic: ', '</h3>')

for i in textlist:
    i = extract(i, '<img src="http://imgs.xkcd.com/comics/scribblenauts.png" title="', '"')


print zip(permlist, textlist)

and whenever I run it, it raises this error:

Traceback (most recent call last):
  File "C:/Python26/test.py", line 15, in <module>
    permlist.append(str(url.urlopen(temp).readlines()[88]))
  File "C:\Python26\lib\urllib2.py", line 124, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python26\lib\urllib2.py", line 389, in open
    response = meth(req, response)
  File "C:\Python26\lib\urllib2.py", line 502, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python26\lib\urllib2.py", line 427, in error
    return self._call_chain(*args)
  File "C:\Python26\lib\urllib2.py", line 361, in _call_chain
    result = func(*args)
  File "C:\Python26\lib\urllib2.py", line 510, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 404: Not Found

What is the problem, but mainly what can I do to fix it?

thanks in advance

5
Contributors
10
Replies
11
Views
7 Years
Discussion Span
Last Post by ov3rcl0ck
0

Looks like one of the 638 web pages is not available. You should use a try/except trap for this case.

0

so what could I use?

sorry, at the moment I just want a quick fix and will figure out the best way when I have time

0
for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))
        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except Error: # catch any exception and continue the for loop
        print "Error at index %d."%i
0

Yeah you'll need to use exceptions, but if you want the script to continue after the error you're going to have to "pass" it, try this:

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except Error, err:
        print "Index Error: %d at %d" % (err, i)
        pass

this will not only print the error and the location of the error but will also pass to keep the loop going.

0

hey again, they are good ideas but whenever I try to run it again it says:

Traceback (most recent call last):
  File "C:\Python26\test.py", line 18, in <module>
    except Error, err:
NameError: name 'Error' is not defined
0

Since you don't know the specific error class, simply use ...

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))
        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except: # catch any exception and continue the for loop
        print "Error at index %d."%i
        pass
0

ok thanks, trying it now, So that I can do better handling soon, how can I find the class?

0

Oh ok thanks, whenever I run the zip part it uses the original text, not what I changed it to, I tried:

for i, j in permlist, textlist:
  print i, ':', j

but it says that it is out of range, what can I do? I have googled it to no avail :(

0

Since you don't know the specific error class, simply use ...

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))
        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except: # catch any exception and continue the for loop
        print "Error at index %d."%i
        pass

My bad i used the wrong exception, vegaseat is right.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.