We're a community of 1077K IT Pros here for help, advice, solutions, professional growth and fun. Join us!
1,076,111 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Start New Discussion Reply to this Discussion

How to set timeout for reading from urls in urllib

Hi,

I am downloading url using urllib2, the problem I am facing is some times server goes down and then read will take indefinite time. I dont want that, I want to raise a exception after 20 secs in this case. There is solution using signal.alarm but it works only for one thread. So I am looking for a solution without using signal.alarm.
How do I do this?

I tried setting defaulttimeout for socekt using, socekt.setdefaulttimeout(20) but it is only for opening url. It wont give exception while reading.

Thanks,
Dilip Kola

3
Contributors
6
Replies
10 Hours
Discussion Span
4 Years Ago
Last Updated
8
Views
Question
Answered
dilipkk
Newbie Poster
9 posts since Mar 2009
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0
slate
Posting Whiz in Training
285 posts since Jun 2008
Reputation Points: 72
Solved Threads: 75
Skill Endorsements: 6

This source code seems to contain other solutions http://code.google.com/p/timeout-urllib2/source/browse/trunk/timeout_urllib2.py.

Also the urllib module in python 3.0 (and may be 2.6?) seems to accept a timeout parameter for urlopen.

Gribouillis
Posting Maven
Moderator
3,101 posts since Jul 2008
Reputation Points: 1,130
Solved Threads: 761
Skill Endorsements: 11

Thanks for reply!!

we don't need python2.6 or 3.0!!
import socket
socket.setdefaulttimeout(5)
import urllib2

Now when we call urlopen it will raise timeout exception after 5 sec. But problem is with read()( to download content) it wont stop after 5 secs :( . signal.alarm works it we do this download sequentially, but I want to parallelize this downloading. So how to do this? Can we use signal.alarm in multiple threads ?

Thanks

dilipkk
Newbie Poster
9 posts since Mar 2009
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

Signal only woks in main thread, in the process that is.

Basically threading is not for "mortals". Not even in python. But you can be a hero:) I am not one.
For instance killing a thread from the main thread is generally not possible. It is discouraged in java, too.

import threading
import os
import time
def download(site):
  downloadbegin[site]=time.time()
  os.system('wget %s' % site) #testing
  pass #download here using urllib
websitelist=["slashdot.org","daniweb.com","http://ubuntu.supp.name/releases/intrepid/ubuntu-8.10-desktop-i386.iso"] 
threadlist =[]
downloadbegin={}

for site in websitelist:
   threadlist.append(threading.Thread( target=download,args=(site,)))

for thread in threadlist:
   thread.start()

for thread in threadlist:
   #you cannot terminate the thread safely here based on downloadbegin
   thread.join()

I would use subprocess and signal.
Killing a python interpreter subprocess with os.kill or win seems not so hard.

slate
Posting Whiz in Training
285 posts since Jun 2008
Reputation Points: 72
Solved Threads: 75
Skill Endorsements: 6

Couldn't you try to use a timer to close the object returned by urlopen after a certain time ? For example

from urllib2 import urlopen
from threading import Timer
url = "http://www.python.org"
fh = urlopen(url)
t = timer(20.0, fh.close)
t.start()
data = fh.read()

I suppose that the fh.read() should raise an exception when fh.close is called by the timer. I can't test it because I dont know a site slow enough.

Gribouillis
Posting Maven
Moderator
3,101 posts since Jul 2008
Reputation Points: 1,130
Solved Threads: 761
Skill Endorsements: 11

Thanks for solution!!
The solution you have given is closing fh immediately so I changed to below, and it is working :).

from urllib2 import urlopen
from threading import Timer
url = "http://www.python.org"
def handler(fh):
fh.close()
fh = urlopen(url)
t = Timer(20.0, handler,[fh])
t.start()
data = fh.read()
t.cancel()
Thank you, Gribouillis.

dilipkk
Newbie Poster
9 posts since Mar 2009
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0
Question Answered as of 4 Years Ago by Gribouillis and slate

This question has already been solved: Start a new discussion instead

Post: Markdown Syntax: Formatting Help
 
You
 
© 2013 DaniWeb® LLC
Page rendered in 0.0733 seconds using 2.66MB