Parse webpage private info

Question

-ordi- 6 Junior Poster in Training

14 Years Ago

Best way to parse this webpage private info:

https://ee.ekool.eu/index_et.html?r=2#/?screenId=g.user.login&err=1

I found Beautiful soap and PyKhtml. What is the better?

python

4 Contributors
8 Replies
533 Views
1 Day Discussion Span
Latest Post 14 Years Ago Latest Post by -ordi-

All 8 Replies

Beat_Slayer 17 Posting Pro in Training

14 Years Ago

You have urllib2 also, or you can try with the Beautiful Soup.

Cheers and happy coding

snippsat 661 Master Poster

14 Years Ago

For log in use mechanize.
for parsing are beautifulSoup or lxml good choice.

import mechanize
browser = mechanize.Browser()
browser.open(your_url)
browser.select_form(nr=0) #Check form name nr=0 work for many
browser['username'] = "xxxxx"
browser['password'] = "xxxxx"
response = browser.submit()
html = response.read()
print html

Edited 14 Years Ago by snippsat because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

zark_yoc 0 Newbie Poster · Answer 1 · 2010-09-12T21:53:05+00:00

zark_yoc 0 Newbie Poster

14 Years Ago

urllib2 module can help you

-ordi- 6 Junior Poster in Training · Answer 2 · 2010-09-12T22:41:12+00:00

For log in use mechanize.
for parsing are beautifulSoup or lxml good choice.

import mechanize
browser = mechanize.Browser()
browser.open(your_url)
browser.select_form(nr=0) #Check form name nr=0 work for many
browser['username'] = "xxxxx"
browser['password'] = "xxxxx"
response = browser.submit()
html = response.read()
print html

File "Ekool.py", line 5, in <module>
    browser.select_form(nr=0) #Check form name nr=0 work for many
  File "/usr/lib/python2.6/site-packages/mechanize/_mechanize.py", line 527, in select_form
    raise FormNotFoundError("no form matching "+description)
mechanize._mechanize.FormNotFoundError: no form matching nr 0

I'm trying to (0, 1, 2, 3) same problem.

-ordi- 6 Junior Poster in Training · Answer 3 · 2010-09-12T23:45:19+00:00

-ordi- 6 Junior Poster in Training

14 Years Ago

Ok, I add a picture, maybe this helps :)

-ordi- 6 Junior Poster in Training · Answer 4 · 2010-09-13T22:25:24+00:00

Ok I'm triyng to login this webpage: https://ee.ekool.eu/index_et.html?r=2#?/
but unfortunatly it's not work. Any ideas?

Beat_Slayer 17 Posting Pro in Training · Answer 5 · 2010-09-13T23:33:31+00:00

Wheres your code?

Wich methods have you tryed to?

Wich are the errors?

Cheers and Happy coding

-ordi- 6 Junior Poster in Training · Answer 6 · 2010-09-13T23:46:02+00:00

import urllib2
import urllib

# build opener with HTTPCookieProcessor
o = urllib2.build_opener( urllib2.HTTPCookieProcessor() )
urllib2.install_opener( o )

# assuming the site expects 'user' and 'pass' as query params
p = urllib.urlencode( { 'username': 'name', 'password': 'password' } )

# perform login with params
f = o.open( 'https://ee.ekool.eu/index_et.html?r=2#/?screenId=g.user.login',  p )
data = f.read()
f.close()

Traceback (most recent call last):
  File "Ekool.py", line 61, in <module>
    f = o.open( 'https://ee.ekool.eu/index_et.html?r=2#/?screenId=g.user.login',  p )
  File "/usr/lib/python2.6/urllib2.py", line 397, in open
    response = meth(req, response)
  File "/usr/lib/python2.6/urllib2.py", line 510, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.6/urllib2.py", line 435, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 518, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 405: Not Allowed
[timo@localhost ~]$

Second ver:

import urllib2

theurl = 'https://ee.ekool.eu/index_et.html?r=2#/?screenId=g.user.login'
username = 'name'
password = 'password'

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
# this creates a password manager
passman.add_password(None, theurl, username, password)
# because we have put None at the start it will always
# use this username/password combination for  urls
# for which `theurl` is a super-url

authhandler = urllib2.HTTPBasicAuthHandler(passman)
# create the AuthHandler

opener = urllib2.build_opener(authhandler)

urllib2.install_opener(opener)
# All calls to urllib2.urlopen will now use our handler
# Make sure not to include the protocol in with the URL, or
# HTTPPasswordMgrWithDefaultRealm will be very confused.
# You must (of course) use it when fetching the page though.

pagehandle = urllib2.urlopen(theurl)
# authentication is now handled automatically for us

print pagehandle

and output:

<addinfourl at 3068350220L whose fp = <socket._fileobject object at 0xb737616c>>

Maybe this works? How to control this, if connecting sucess?

Edit:

Second solution works! Thanks!

Parse webpage private info

Recommended Answers Collapse Answers

All 8 Replies

Recommended Answers