Getting information from website

Question

apolo_x 0 Newbie Poster

16 Years Ago

Hi!

I want to get information from a sports web site (ex:https://www.bwin.com/pt/futebol), and I already tried htmllib and some other stuff.... the problem is that I can get the page's source code but not the same information that I can see in my browser. Particularly, I'm interested in the teams names!
How can I do this! I can open the browser, copy the information to Notepad, save then and then process then with Python... but... I want to get them directly!

Best regards!

open-source python web-browser

5 Contributors
9 Replies
224 Views
1 Day Discussion Span
Latest Post 16 Years Ago Latest Post by Stefano Mtangoo

All 9 Replies

scru 909 Posting Virtuoso

16 Years Ago

You need establish a pattern to follow in the source code that the team names follow before attempting to retrieve that information.

My spanish is very bad, so i can't really help you find that pattern. But it should be pretty easy to find once you know what you are looking for.

After you find that pattern, the easiest way is to get the source code and pull out everything that matches that pattern. If you happen to know regular expressions, this can be very easy as using them would ultimately save you a lot of coding.

scru 909 Posting Virtuoso

16 Years Ago

You're right, that area seems to be generated. Perhaps by javascript? Either way looks like you have a pretty tedious job ahead of you.

leegeorg07

16 Years Ago

you can find the names in firefox just click tools>view source and then use ctrl f and type the name

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

apolo_x 0 Newbie Poster · Answer 1 · 2009-01-27T21:23:41+00:00

Hi! Thks for your answer, but I did not understood it completely!

How can I find the patern? for example "Hamburguer" is a teams name present in the english version of the site https://www.bwin.com/sportsbook.aspx but I can't find it in the source code! Even though If I try to view the source directly in the browser!

Thanks again!

Stefano Mtangoo 455 Senior Poster · Answer 2 · 2009-01-28T00:14:30+00:00

check urllib2 for reading source codes (HTML) and wxpython's html stuffs, for simple HTML browser like before going for something complex

http://www.wxpython.org/docs/api/wx.html-module.html
http://www.python.org/doc/2.5.2/lib/module-urllib2.html

apolo_x 0 Newbie Poster · Answer 3 · 2009-01-28T17:40:28+00:00

leegeorg07 is right! I can see the teams name in the source code...
But, unfortunatly, what I'm getting from the webiste isn't the source(I thought it was!! now I don't know what it is...)

My code is something like this:

import urllib
f = urllib.urlopen("https://www.bwin.com/es/betsnew.aspx?SportID=4")
s = f.read()
#print s
f.close()

fich=open('test.txt','w')
fich.write(s)
fich.close()

I store the information in the file test.txt, it looks like the source but there aren't teams name in there!!!

Thanks!

jlm699 320 Veteran Poster · Answer 4 · 2009-01-28T20:55:56+00:00

I can see the teams name in the source code... But, unfortunatly, what I'm getting from the webiste isn't the source

Take note of the type of webpage you're getting your information from (.aspx)

The ASP.NET framework allows dynamic websites to be created (take note of the web address as you click around on the site... it doesn't change much, yet the content changes.

So until the page is rendered with your actions, the "source" will be the basic framework.

apolo_x 0 Newbie Poster · Answer 5 · 2009-01-28T21:22:58+00:00

Ok, you give me some help! But is there a way of force the page rendering and then get the code? Like it is done in the browsers?

Thanks

Stefano Mtangoo 455 Senior Poster · Answer 6 · 2009-01-28T21:27:27+00:00

This is my Idea, that unless the sites you want is html (static), you will need to code small web browser; which is big pain but good Gain anyway

Getting information from website

Recommended Answers Collapse Answers

All 9 Replies

Recommended Answers