Hi!

I want to get information from a sports web site (ex:https://www.bwin.com/pt/futebol), and I already tried htmllib and some other stuff.... the problem is that I can get the page's source code but not the same information that I can see in my browser. Particularly, I'm interested in the teams names!
How can I do this! I can open the browser, copy the information to Notepad, save then and then process then with Python... but... I want to get them directly!

Best regards!

Recommended Answers

All 9 Replies

You need establish a pattern to follow in the source code that the team names follow before attempting to retrieve that information.

My spanish is very bad, so i can't really help you find that pattern. But it should be pretty easy to find once you know what you are looking for.

After you find that pattern, the easiest way is to get the source code and pull out everything that matches that pattern. If you happen to know regular expressions, this can be very easy as using them would ultimately save you a lot of coding.

Hi! Thks for your answer, but I did not understood it completely!

How can I find the patern? for example "Hamburguer" is a teams name present in the english version of the site https://www.bwin.com/sportsbook.aspx but I can't find it in the source code! Even though If I try to view the source directly in the browser!

Thanks again!

You're right, that area seems to be generated. Perhaps by javascript? Either way looks like you have a pretty tedious job ahead of you.

Member Avatar for leegeorg07

you can find the names in firefox just click tools>view source and then use ctrl f and type the name

leegeorg07 is right! I can see the teams name in the source code...
But, unfortunatly, what I'm getting from the webiste isn't the source(I thought it was!! now I don't know what it is...)

My code is something like this:

import urllib
f = urllib.urlopen("https://www.bwin.com/es/betsnew.aspx?SportID=4")
s = f.read()
#print s
f.close()

fich=open('test.txt','w')
fich.write(s)
fich.close()

I store the information in the file test.txt, it looks like the source but there aren't teams name in there!!!

Thanks!

I can see the teams name in the source code... But, unfortunatly, what I'm getting from the webiste isn't the source

Take note of the type of webpage you're getting your information from (.aspx)

The ASP.NET framework allows dynamic websites to be created (take note of the web address as you click around on the site... it doesn't change much, yet the content changes.

So until the page is rendered with your actions, the "source" will be the basic framework.

Ok, you give me some help! But is there a way of force the page rendering and then get the code? Like it is done in the browsers?

Thanks

This is my Idea, that unless the sites you want is html (static), you will need to code small web browser; which is big pain but good Gain anyway

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.