Does anyone know how to download the index page from the website using a python script?

For a start, I don't understand the concept, google doesnt seem to throw up any relevant articles so am a little lost!

Recommended Answers

All 2 Replies

from urllib2 import urlopen
print(urlopen('http://www.daniweb.com/forums/').read())

Maybe my between code snippet would be handy to pick out the info you want.

If its a site that doesn't allow programs or scripts to access them, you'll need to change your user-agent, and possibly be able to handle cookies.

import urllib, urllib2, cookielib

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders.append(('User-agent', 'Mozilla/4.0'))
opener.addheaders.append( ('Referer', 'http://www.daniweb.com') )

resp = opener.open('http://www.daniweb.com')
source_of_index = resp.read()

#write contents to file to see if we done it right
f = open('fi.html','w')
f.write(source_of_index)
f.close()
resp.close()
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.