| | |
data grabbing from html sites
![]() |
hi,
i'd like to create something that basically grabbes information from websites, however i havn't any experience in urllib (apart from very basic page reading) and the issue is that the page i want to grab data from checks to see if another page is connected to it or perhaps to better phrase it: checks to see if it in the correct ifram in relation to an iframe next to it, hard to explain but i any help or pointers is appreciated.
i'd like to create something that basically grabbes information from websites, however i havn't any experience in urllib (apart from very basic page reading) and the issue is that the page i want to grab data from checks to see if another page is connected to it or perhaps to better phrase it: checks to see if it in the correct ifram in relation to an iframe next to it, hard to explain but i any help or pointers is appreciated.
•
•
Join Date: May 2005
Posts: 215
Reputation:
Solved Threads: 16
I just wrote a little script that grabs searches for and grabs bittorrant files. One of the versions needs to read page source.
when it comes to the web, I am not too knowlegable, not sure what that means. if you just want to grab the source from a websight and put it into a string to then parse out the needed info this will get you started
now using different string methods, you can parse out any needed data
edit added later//
I just reread you post, you seem to need to do more then I just explained. Sorry I don't have something more useful to tell you.
•
•
•
•
i want to grab data from checks to see if another page is connected to it or perhaps to better phrase it:
Python Syntax (Toggle Plain Text)
import urllib2 url = 'http://google.com' # this line creates an object that contains the page source page = urllib2.urlopen(url) # using the read method this line puts the object into a string, so it can be manipulated page_string = page.read()
now using different string methods, you can parse out any needed data
edit added later//
I just reread you post, you seem to need to do more then I just explained. Sorry I don't have something more useful to tell you.
In a perfect world exceptions would not be needed.
it's a game,
i run out of ideas and frequently ask my friends for things to code this time he said that there is a game site he plays (i now play it
) but the layout is rubbish so he wanted me to code something that grabs the data from certain pages then put it in a tkinter gui style table so it's easier to work out numbers and how many of this you need to train and how many of that you need to buy, ect, ect
*back to the site*
the site only shows this 'end of turn' page (and other's) if it is still 'connected' to the side bar at the left hand side, if it's not, it redirects you to an error page. So... i want to know how to trick a webpage into thinking it is being viewed how it should (with it's main page and toolbar down the side), instead of being opened on it's own or by a program.
it's hard to explain.
and if you want to view the site you would have to register ect ect
unless of course your interested in text based, resource handling style web games
*EDIT: www.aarcsoft.com, then click on 'games' then click on the top game, then choose the server you want to connect to, then register and play, simple as that really
i run out of ideas and frequently ask my friends for things to code this time he said that there is a game site he plays (i now play it
) but the layout is rubbish so he wanted me to code something that grabs the data from certain pages then put it in a tkinter gui style table so it's easier to work out numbers and how many of this you need to train and how many of that you need to buy, ect, ect*back to the site*
the site only shows this 'end of turn' page (and other's) if it is still 'connected' to the side bar at the left hand side, if it's not, it redirects you to an error page. So... i want to know how to trick a webpage into thinking it is being viewed how it should (with it's main page and toolbar down the side), instead of being opened on it's own or by a program.
it's hard to explain.
and if you want to view the site you would have to register ect ect
unless of course your interested in text based, resource handling style web games
*EDIT: www.aarcsoft.com, then click on 'games' then click on the top game, then choose the server you want to connect to, then register and play, simple as that really
•
•
Join Date: Jul 2006
Posts: 6
Reputation:
Solved Threads: 0
hello all
hi all,
same thing - same problem well i guess that we have the same interests, i also want to grab some data out of a
exisitng site - a forum.
first of - i have to explain something; I have to grab some data out of a phpBB in order to do some field reseach. I need the data out of a forum that is runned by a user community. I need the data to analyze the discussions.
Nothing harmeful - nothing bad - nothing serious and dangerous. But the issue is. i have to get the data - so what?
well we can think of some automation that runs with WWW :: Mechanize through the forums and gets all the data
http://search.cpan.org/search?query=...FHTTP&mode=all
no - i need some individual threads - to analyze them - (ABout 400 to 600 threads )
some examples
http://www.phpbb.com/phpBB/viewtopic.php?t=415990
http://www.phpbb.com/phpBB/viewtopic.php?t=415980
http://www.phpbb.com/phpBB/viewtopic.php?t=415970
btw these are only examples - not out of the real forum that is out of interest.
I need the data in a allmost full and complete formate. So i need all the data like
username .-
forum
thread
topic
text of the posting and so on and so on.
how to do that?
i need some kind of a grabbing tool - can i do it with that kind of tool. How do i sove the storing-issue into the local mysql-database.
Well you see that is a tricky work - and i am pretty sure taht i am getting help here. So for any and all help i am very very thankful
many many thanks in advance
Ethno-reseracher
btw: for the automation I suggest looking at WWW::Mechanize as it encapsulates many of the lower-level web automation tools provided by perl. By the way - we *will not* find better web automation tools in any language. The LWP/HTTP suite of modules are extremely powerful.
•
•
•
•
Originally Posted by shanenin
thanks for the nice explanation. I don't have any ideas, but maybe someone else will.
hi all,
same thing - same problem well i guess that we have the same interests, i also want to grab some data out of a
exisitng site - a forum.
first of - i have to explain something; I have to grab some data out of a phpBB in order to do some field reseach. I need the data out of a forum that is runned by a user community. I need the data to analyze the discussions.
Nothing harmeful - nothing bad - nothing serious and dangerous. But the issue is. i have to get the data - so what?
well we can think of some automation that runs with WWW :: Mechanize through the forums and gets all the data
http://search.cpan.org/search?query=...FHTTP&mode=all
no - i need some individual threads - to analyze them - (ABout 400 to 600 threads )
some examples
http://www.phpbb.com/phpBB/viewtopic.php?t=415990
http://www.phpbb.com/phpBB/viewtopic.php?t=415980
http://www.phpbb.com/phpBB/viewtopic.php?t=415970
btw these are only examples - not out of the real forum that is out of interest.
I need the data in a allmost full and complete formate. So i need all the data like
username .-
forum
thread
topic
text of the posting and so on and so on.
how to do that?
i need some kind of a grabbing tool - can i do it with that kind of tool. How do i sove the storing-issue into the local mysql-database.
Well you see that is a tricky work - and i am pretty sure taht i am getting help here. So for any and all help i am very very thankful
many many thanks in advance
Ethno-reseracher
btw: for the automation I suggest looking at WWW::Mechanize as it encapsulates many of the lower-level web automation tools provided by perl. By the way - we *will not* find better web automation tools in any language. The LWP/HTTP suite of modules are extremely powerful.
"Beautiful Soup" is an HTML/XML parser for Python that can turn even poorly written markup code into a parse tree, so you can extract information.
Download the free program and documentation from:
http://www.crummy.com/software/BeautifulSoup/
Download the free program and documentation from:
http://www.crummy.com/software/BeautifulSoup/
•
•
Join Date: Jul 2006
Posts: 6
Reputation:
Solved Threads: 0
hello many many thanks
guessing that this can help me.
well i look forward to learn more about it.
thanks in advande
meta
•
•
•
•
Originally Posted by bumsfeld
"Beautiful Soup" is an HTML/XML parser for Python that can turn even poorly written markup code into a parse tree, so you can extract information.
Download the free program and documentation from:
http://www.crummy.com/software/BeautifulSoup/
well i look forward to learn more about it.
thanks in advande
meta
![]() |
Other Threads in the Python Forum
- Previous Thread: I have a problem in building class Tree (Binary Search)
- Next Thread: Indicate Mouseover Event (wxPython)
| Thread Tools | Search this Thread |
address aliased anydbm app beginner bits calling casino changecolor cipher clear conversion coordinates corners count cturtle curves definedlines development dictionary dynamic events excel external feet file float format function generator getvalue handling homework iframe images import input ip java keycontrol line linux list lists loan loop maintain matching maze millimeter mouse number numbers output parsing path port prime programming py2exe pygame pymailer python queue random rational raw_input recursion recursive scrolledtext searchingfile signal singleton slicenotation split string strings tails text threading time tlapse tooltip tuple tutorial type ubuntu unicode url urllib urllib2 valueerror variable variables vigenere web whileloop word wxpython xlwt







