Help with cookies/authentication

Question

trihaitran 0 Light Poster

16 Years Ago

Hi I am trying to pull some data from a Web site: http://schoolfinder.com

The issue is that I want to use the advanced search feature which requires logging into the Web site. I have a username and password, however I want to connect programmatically from Python. I have done data capture from the Web before so the only new thing here to me is the authentication stuff. I need cookies as this page describes: http://schoolfinder.com/login/login.asp

I already know how to enter POST/GET data to a request, but how do I deal with cookies/authentication? I have read a few articles without success:

urllib2:
http://www.voidspace.org.uk/python/articles/urllib2.shtml#id6

urllib2 Cookbook:
http://personalpages.tds.net/~kent37/kk/00010.html

basic authentication:
http://www.voidspace.org.uk/python/articles/authentication.shtml#id19

cookielib:
http://www.voidspace.org.uk/python/articles/cookielib.shtml

Is there some other resource I am missing? Is it possible that someone could setup a basic script that would allow me to connect to schoolfinder.com with my username and password? My username is "greenman", password is "greenman". All I need to know is how to access pages as if I logged in by Web browser.

Thank you very much.

python web-browser

3 Contributors
4 Replies
213 Views
3 Weeks Discussion Span
Latest Post 16 Years Ago Latest Post by trihaitran

All 4 Replies

BearofNH 104 Posting Whiz

16 Years Ago

You didn't mention the easiest (though not too secure) approach: connect to http://greenman:greenman@schoolfinder.com/ and you're in without needing any new magic.

Or did you want something more secure?

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

trihaitran 0 Light Poster · Answer 1 · 2008-08-11T00:12:01+00:00

The link "http://greenman:greenman@schoolfinder.com/" does not seem to log me into the Web site. Is that for basic authentication? I'm sure this Web site uses cookies somewhere, but I'm just not understanding how to deal with it.

I put your link into my browser (Safari, Camino, and Firefox) and it shows me as not logged in. Thanks for the help though.

Czechs Mex 0 Newbie Poster · Answer 2 · 2008-08-15T04:33:51+00:00

If you resolve this issue, please post it here! I've been looking for the solution to the exact same problem.

If I find anything, I'll post it here as well. Thanks.

trihaitran 0 Light Poster · Answer 3 · 2008-08-31T03:00:58+00:00

I was able to solve this problem after a lot more research and tinkering. Here is the solution for anyone interested.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import cookielib
import urllib
import urllib2

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
resp = opener.open('http://schoolfinder.com') # save a cookie

theurl = 'http://schoolfinder.com/login/login.asp' # an example url that sets a cookie, try different urls here and see the cookie collection you can make !
body={'usr':'greenman','pwd':'greenman'}
txdata = urllib.urlencode(body) # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode
txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'} # fake a user agent, some websites (like google) don't like automated exploration


try:
    req = urllib2.Request(theurl, txdata, txheaders) # create a request object
    handle = opener.open(req) # and open it to return a handle on the url
    HTMLSource = handle.read()
    f = file('test.html', 'w')
    f.write(HTMLSource)
    f.close()

except IOError, e:
    print 'We failed to open "%s".' % theurl
    if hasattr(e, 'code'):
        print 'We failed with error code - %s.' % e.code
    elif hasattr(e, 'reason'):
        print "The error object has the following 'reason' attribute :", e.reason
        print "This usually means the server doesn't exist, is down, or we don't have an internet connection."
        sys.exit()

else:
    print 'Here are the headers of the page :'
    print handle.info() # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)

Help with cookies/authentication

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers