| | |
Using mechanize to do website authentication
Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
![]() |
•
•
Join Date: Dec 2006
Posts: 28
Reputation:
Solved Threads: 0
I am trying to write a web scraper and am having trouble accessing pages that require authentication. I am attempting to utilise the mechanize library, but am having difficulties. The site I am trying to login is http://www.princetonreview.com/Login3.aspx?uidbadge=
user: bugmenot2008@yahoo.com
pass: letmeinalready
Previously I did something similar to another site: schoolfinder.com. Here is my code for that:
This method does not work on the Princeton Review site however. Interestingly I cannot even get mechanize to access the schoolfinder.com site. Here is the code I am using:
This code is so short and I just cannot figure out what I am doing wrong. What is incorrect about this? Thank you in advance.
user: bugmenot2008@yahoo.com
pass: letmeinalready
Previously I did something similar to another site: schoolfinder.com. Here is my code for that:
Python Syntax (Toggle Plain Text)
import cookielib import urllib import urllib2 cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) resp = opener.open('http://schoolfinder.com') # save a cookie theurl = 'http://schoolfinder.com/login/login.asp' # an example url that sets a cookie, try different urls here and see the cookie collection you can make ! body={'usr':'greenman','pwd':'greenman'} txdata = urllib.urlencode(body) # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode txheaders = {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'} # fake a user agent, some websites (like google) don't like automated exploration try: req = urllib2.Request(theurl, txdata, txheaders) # create a request object handle = opener.open(req) # and open it to return a handle on the url HTMLSource = handle.read() f = file('test.html', 'w') f.write(HTMLSource) f.close() except IOError, e: print 'We failed to open "%s".' % theurl if hasattr(e, 'code'): print 'We failed with error code - %s.' % e.code elif hasattr(e, 'reason'): print "The error object has the following 'reason' attribute :", e.reason print "This usually means the server doesn't exist, is down, or we don't have an internet connection." sys.exit() else: print 'Here are the headers of the page :' print handle.info() # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)
This method does not work on the Princeton Review site however. Interestingly I cannot even get mechanize to access the schoolfinder.com site. Here is the code I am using:
Python Syntax (Toggle Plain Text)
#!/usr/bin/env python # -*- coding: UTF-8 -*- import mechanize theurl = 'http://www.princetonreview.com/Login3.aspx?uidbadge=' mech = mechanize.Browser() mech.open(theurl) mech.select_form(nr=0) mech["ctl00$MasterMainBodyContent$txtUsername"] = "bugmenot2008@yahoo.com" mech["ctl00$MasterMainBodyContent$txtPassword"] = "letmeinalready" results = mech.submit().read() f = file('test.html', 'w') f.write(results) # write to a test file f.close()
This code is so short and I just cannot figure out what I am doing wrong. What is incorrect about this? Thank you in advance.
![]() |
Other Threads in the Python Forum
- Previous Thread: Nine Mens Morris Game
- Next Thread: FITS image dispaly using python
| Thread Tools | Search this Thread |
Tag cloud for Python
accessdenied apache application argv beginner book change code color dictionary dynamic edit editing enter examples excel file filename float format ftp function gui homework import inches input java keyboard lapse library line lines linux list lists loop microphone mouse movingimageswithpygame mysql newb number numbers numeric output parameters parsing path port prime program programming projects py2exe pygame pyopengl pyqt python random recursion recursive redirect remote reverse rpg scrolledtext search server session simple smtp software sprite ssh statictext string strings syntax table tennis terminal text thread threading time tkinter tlapse trick tuple tutorial ubuntu unicode unit urllib urllib2 variable windows wordgame wxpython





