Hi all.

I would like to be able to parse some data from a password protected site.

The parsing of the data is already developed and tested (I manually logged in to the site and downloaded the source code for testing purposes).

I am stuck at the log in part. I have been reading a lot about it, but still haven't managed to do it by myself. I must say my knowledge of python is pretty basic. I have already read urllib2 The Missing Manual, the urllib2 documentation, this article, too, and still haven't succeeded. I know the answer is in these pages but I am in need of a little guidance here. A month ago I didn't know how to do the 'hello world' in python and now I am dealing with HTTP Authentication, openers, handlers ! So you can imagine how much confused I am.

Correct me if I am wrong, which probably I am, first I have to submit the username and password from the form as POST and then do the HTTP Authentification thing? Or is submitting the POST variables enough?

I have created an account at the site so that you can, if you will, work with real data.
Login page
username = nunos123
password = qwerty

The 'id' for the username field is : 'frmUsername'
The 'id' for the password field is : 'frmPassword'

Here are the bits of code I collected and adapted from the previous links I think that at some point will be used.

Please note that this is not a working code!

import urllib
import urllib2

url = 'http://www.bricklink.com/login.asp?logPage=/my.asp&logFolder=p&logSub=w'

username = 'nunos123'
password = 'qwerty'

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'

#send username and password as POST and add user_agent header
values = {'username' : username, 'password' : password}
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

#HTTP Authentication
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
urllib2.install_opener(opener)
pagehandle = urllib2.urlopen(theurl)

Any help on this is greatly appreciated. Thanks for your time.

Recommended Answers

All 7 Replies

Hello, I'm pretty much at the same point you are at as far as working with web pages goes. I haven't actually logged in to a page with user authentication yet, and I'm also using python 3.1, but I did a quick search on google and found this, it looks like it might help.
http://stackoverflow.com/questions/189555/how-to-use-python-to-login-to-a-webpage-and-retrieve-cookies-for-later-usage

I actually managed to work arround for the site I was trying to login. This might not work with any site, but it works for the one I wanted. I am using the module 'mechanize' that handles cookies and such automatically so you don't have to worry about them.

url = "http://www.example.com"
mech.set_handle_robots(False) # might be needed, its best to leave it that way
mech.open(url)
mech.select_form('loginForm') #loginForm is the 'id' of the form you want to send data to and submit
mech['frmUsername'] = your_username # frmUsername is the 'id of the filed of the form
mech['frmPassword'] = your_password # frmPasword is the 'id of the filed of the form
mech.submit() #submit form

Hope it helps.

Can you please elaborate on how you used mechanize. I have been stuck with this problem from last 3 days :sad:

Thanks

Can you please elaborate on how you used mechanize. I have been stuck with this problem from last 3 days :sad:
Thanks

I am not a an expert mechanize user, not even close, but I will try to help you has far as I can.

What I can say to you is that for a simple form submit the code in previous post does the trick. What exactly you are having trouble understand?

I am assuming that you know that first you have to 'analyze' the html of the page the form is in. For this I recommend using the firefox extension 'Firebug'. Find out the id of the form and the fields you want to submit. If the form has no id, just do this: mech.select_form(nr=1) where nr is the number of the form in the html. For detailed information on how to use mechanize module, go here.

Cheers

Thanks Nunos for the Firebug tip :)

Actually I just copy pasted ur code and expected it to work. So I did not know how you defined mech. I finally managed to do it

import mechanize
from urllib2 import urlopen
from ClientForm import ParseResponse

response = mechanize.urlopen("http://192.168.1.200/login.php")
#print response.read()
forms = ParseResponse(response, backwards_compat=False)
form = forms[0]
print form

form["LOGIN_USER"] = "admin"
form["LOGIN_PASSWD"] = "wifi"
print urlopen(form.click()).read()

The problem is after this I need to navigate to other pages from here like for example to http:/192.168.1.200/blah.php and change some fields ( I have started using the mechanize browser). I am stuck here now :) Anyways as you have said I need to analyze the form and firebug wil help me :)

Thanks

Find out the id of the form and the fields you want to submit. If the form has no id, just do this: mech.select_form(nr=1) where nr is the number of the form in the html.

Thanks for pointing this out.. I was stuck because I was selecting the wrong form. Thanks a lot :icon_mrgreen:. Hopefully I will not have any more problems :S

Thanks for pointing this out.. I was stuck because I was selecting the wrong form. Thanks a lot :icon_mrgreen:. Hopefully I will not have any more problems :S

Glad I could help. Cheers.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.