0

Does anyone know how to download the index page from the website using a python script?

For a start, I don't understand the concept, google doesnt seem to throw up any relevant articles so am a little lost!

3
Contributors
2
Replies
3
Views
7 Years
Discussion Span
Last Post by Tech B
0

If its a site that doesn't allow programs or scripts to access them, you'll need to change your user-agent, and possibly be able to handle cookies.

import urllib, urllib2, cookielib

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders.append(('User-agent', 'Mozilla/4.0'))
opener.addheaders.append( ('Referer', 'http://www.daniweb.com') )

resp = opener.open('http://www.daniweb.com')
source_of_index = resp.read()

#write contents to file to see if we done it right
f = open('fi.html','w')
f.write(source_of_index)
f.close()
resp.close()
This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.