urllib2 problem

Question

leegeorg07

15 Years Ago

Hi, I have this code:

import urllib2 as url
import webbrowser

def extract(text, sub1, sub2):
    """
    extract a substring from text between first
    occurances of substrings sub1 and sub2
    """
    return text.split(sub1, 1)[-1].split(sub2, 1)[0]
start="http://xkcd.com/"
permlist=[]
textlist=[]
for i in range(1, 638):
    temp=start+str(i)
    permlist.append(str(url.urlopen(temp).readlines()[88]))
    textlist.append(str(url.urlopen(temp).readlines()[77]))

for i in permlist:
    i = extract(i, '<h3>Permanent link to this comic: ', '</h3>')

for i in textlist:
    i = extract(i, '<img src="http://imgs.xkcd.com/comics/scribblenauts.png" title="', '"')


print zip(permlist, textlist)

and whenever I run it, it raises this error:

Traceback (most recent call last):
  File "C:/Python26/test.py", line 15, in <module>
    permlist.append(str(url.urlopen(temp).readlines()[88]))
  File "C:\Python26\lib\urllib2.py", line 124, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python26\lib\urllib2.py", line 389, in open
    response = meth(req, response)
  File "C:\Python26\lib\urllib2.py", line 502, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python26\lib\urllib2.py", line 427, in error
    return self._call_chain(*args)
  File "C:\Python26\lib\urllib2.py", line 361, in _call_chain
    result = func(*args)
  File "C:\Python26\lib\urllib2.py", line 510, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 404: Not Found

What is the problem, but mainly what can I do to fix it?

thanks in advance

python

5 Contributors
10 Replies
204 Views
4 Days Discussion Span
Latest Post 15 Years Ago Latest Post by ov3rcl0ck

sneekula 969 Nearly a Posting Maven

15 Years Ago

Looks like one of the 638 web pages is not available. You should use a try/except trap for this case.

djidjadji 28 Light Poster

15 Years Ago

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))
        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except Error: # catch any exception and continue the for loop
        print "Error at index %d."%i

ov3rcl0ck 25 Junior Poster

15 Years Ago

Yeah you'll need to use exceptions, but if you want the script to continue after the error you're going to have to "pass" it, try this:

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except Error, err:
        print "Index Error: %d at %d" % (err, i)
        pass

this will not only print the error and the location of the error but will also pass to keep the loop going.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

leegeorg07 · Answer 1 · 2009-09-22T00:34:14+00:00

so what could I use?

sorry, at the moment I just want a quick fix and will figure out the best way when I have time

leegeorg07 · Answer 2 · 2009-09-22T23:20:09+00:00

hey again, they are good ideas but whenever I try to run it again it says:

Traceback (most recent call last):
  File "C:\Python26\test.py", line 18, in <module>
    except Error, err:
NameError: name 'Error' is not defined

vegaseat 1,735 DaniWeb's Hypocrite Team Colleague · Answer 3 · 2009-09-22T23:44:29+00:00

Since you don't know the specific error class, simply use ...

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))
        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except: # catch any exception and continue the for loop
        print "Error at index %d."%i
        pass

leegeorg07 · Answer 4 · 2009-09-22T23:53:52+00:00

ok thanks, trying it now, So that I can do better handling soon, how can I find the class?

vegaseat 1,735 DaniWeb's Hypocrite Team Colleague · Answer 5 · 2009-09-23T00:23:02+00:00

vegaseat 1,735 DaniWeb's Hypocrite

15 Years Ago

Well you found it in your first post ...
HTTPError

leegeorg07 · Answer 6 · 2009-09-23T16:15:48+00:00

Oh ok thanks, whenever I run the zip part it uses the original text, not what I changed it to, I tried:

for i, j in permlist, textlist:
  print i, ':', j

but it says that it is out of range, what can I do? I have googled it to no avail :(

ov3rcl0ck 25 Junior Poster · Answer 7 · 2009-09-25T23:51:32+00:00

Since you don't know the specific error class, simply use ...

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))
        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except: # catch any exception and continue the for loop
        print "Error at index %d."%i
        pass

My bad i used the wrong exception, vegaseat is right.