954,557 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Parse webpage private info

Best way to parse this webpage private info:

https://ee.ekool.eu/index_et.html?r=2#/?screenId=g.user.login&err=1

I found Beautiful soap and PyKhtml. What is the better?

-ordi-
Junior Poster in Training
92 posts since Dec 2009
Reputation Points: 18
Solved Threads: 11
 

You have urllib2 also, or you can try with the Beautiful Soup.

Cheers and happy coding

Beat_Slayer
Posting Pro in Training
405 posts since Jun 2010
Reputation Points: 30
Solved Threads: 105
 

For log in use mechanize.
for parsing are beautifulSoup or lxml good choice.

import mechanize
browser = mechanize.Browser()
browser.open(your_url)
browser.select_form(nr=0) #Check form name nr=0 work for many
browser['username'] = "xxxxx"
browser['password'] = "xxxxx"
response = browser.submit()
html = response.read()
print html
snippsat
Practically a Posting Shark
808 posts since Aug 2008
Reputation Points: 353
Solved Threads: 294
 

urllib2 module can help you

zark_yoc
Newbie Poster
15 posts since Sep 2010
Reputation Points: 10
Solved Threads: 1
 

For log in use mechanize. for parsing are beautifulSoup or lxml good choice.

import mechanize
browser = mechanize.Browser()
browser.open(your_url)
browser.select_form(nr=0) #Check form name nr=0 work for many
browser['username'] = "xxxxx"
browser['password'] = "xxxxx"
response = browser.submit()
html = response.read()
print html
File "Ekool.py", line 5, in <module>
    browser.select_form(nr=0) #Check form name nr=0 work for many
  File "/usr/lib/python2.6/site-packages/mechanize/_mechanize.py", line 527, in select_form
    raise FormNotFoundError("no form matching "+description)
mechanize._mechanize.FormNotFoundError: no form matching nr 0


I'm trying to (0, 1, 2, 3) same problem.

-ordi-
Junior Poster in Training
92 posts since Dec 2009
Reputation Points: 18
Solved Threads: 11
 

Ok, I add a picture, maybe this helps :)

Attachments pilt48.png 171.48KB
-ordi-
Junior Poster in Training
92 posts since Dec 2009
Reputation Points: 18
Solved Threads: 11
 

Ok I'm triyng to login this webpage: https://ee.ekool.eu/index_et.html?r=2#?/
but unfortunatly it's not work. Any ideas?

-ordi-
Junior Poster in Training
92 posts since Dec 2009
Reputation Points: 18
Solved Threads: 11
 

Wheres your code?

Wich methods have you tryed to?

Wich are the errors?

Cheers and Happy coding

Beat_Slayer
Posting Pro in Training
405 posts since Jun 2010
Reputation Points: 30
Solved Threads: 105
 
import urllib2
import urllib

# build opener with HTTPCookieProcessor
o = urllib2.build_opener( urllib2.HTTPCookieProcessor() )
urllib2.install_opener( o )

# assuming the site expects 'user' and 'pass' as query params
p = urllib.urlencode( { 'username': 'name', 'password': 'password' } )

# perform login with params
f = o.open( 'https://ee.ekool.eu/index_et.html?r=2#/?screenId=g.user.login',  p )
data = f.read()
f.close()
Traceback (most recent call last):
  File "Ekool.py", line 61, in <module>
    f = o.open( 'https://ee.ekool.eu/index_et.html?r=2#/?screenId=g.user.login',  p )
  File "/usr/lib/python2.6/urllib2.py", line 397, in open
    response = meth(req, response)
  File "/usr/lib/python2.6/urllib2.py", line 510, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.6/urllib2.py", line 435, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 518, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 405: Not Allowed
[timo@localhost ~]$


Second ver:

import urllib2

theurl = 'https://ee.ekool.eu/index_et.html?r=2#/?screenId=g.user.login'
username = 'name'
password = 'password'

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
# this creates a password manager
passman.add_password(None, theurl, username, password)
# because we have put None at the start it will always
# use this username/password combination for  urls
# for which `theurl` is a super-url

authhandler = urllib2.HTTPBasicAuthHandler(passman)
# create the AuthHandler

opener = urllib2.build_opener(authhandler)

urllib2.install_opener(opener)
# All calls to urllib2.urlopen will now use our handler
# Make sure not to include the protocol in with the URL, or
# HTTPPasswordMgrWithDefaultRealm will be very confused.
# You must (of course) use it when fetching the page though.

pagehandle = urllib2.urlopen(theurl)
# authentication is now handled automatically for us

print pagehandle


and output:

<addinfourl at 3068350220L whose fp = <socket._fileobject object at 0xb737616c>>


Maybe this works? How to control this, if connecting sucess?

Edit:

Second solution works! Thanks!

-ordi-
Junior Poster in Training
92 posts since Dec 2009
Reputation Points: 18
Solved Threads: 11
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: