data grabbing from html sites

Reply

Join Date: Aug 2005
Posts: 138
Reputation: a1eio is an unknown quantity at this point 
Solved Threads: 21
a1eio's Avatar
a1eio a1eio is offline Offline
Junior Poster

data grabbing from html sites

 
0
  #1
Oct 8th, 2005
hi,
i'd like to create something that basically grabbes information from websites, however i havn't any experience in urllib (apart from very basic page reading) and the issue is that the page i want to grab data from checks to see if another page is connected to it or perhaps to better phrase it: checks to see if it in the correct ifram in relation to an iframe next to it, hard to explain but i any help or pointers is appreciated.
Reply With Quote Quick reply to this message  
Join Date: May 2005
Posts: 215
Reputation: shanenin is an unknown quantity at this point 
Solved Threads: 16
shanenin shanenin is offline Offline
Posting Whiz in Training

Re: data grabbing from html sites

 
0
  #2
Oct 8th, 2005
I just wrote a little script that grabs searches for and grabs bittorrant files. One of the versions needs to read page source.

i want to grab data from checks to see if another page is connected to it or perhaps to better phrase it:
when it comes to the web, I am not too knowlegable, not sure what that means. if you just want to grab the source from a websight and put it into a string to then parse out the needed info this will get you started
  1. import urllib2
  2.  
  3. url = 'http://google.com'
  4. # this line creates an object that contains the page source
  5. page = urllib2.urlopen(url)
  6. # using the read method this line puts the object into a string, so it can be manipulated
  7. page_string = page.read()

now using different string methods, you can parse out any needed data


edit added later//

I just reread you post, you seem to need to do more then I just explained. Sorry I don't have something more useful to tell you.
In a perfect world exceptions would not be needed.
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 138
Reputation: a1eio is an unknown quantity at this point 
Solved Threads: 21
a1eio's Avatar
a1eio a1eio is offline Offline
Junior Poster

Re: data grabbing from html sites

 
0
  #3
Oct 9th, 2005
don't worry, it's a start, at least now i can read the sites so thanks for the help
goto start somewhere
Reply With Quote Quick reply to this message  
Join Date: May 2005
Posts: 215
Reputation: shanenin is an unknown quantity at this point 
Solved Threads: 16
shanenin shanenin is offline Offline
Posting Whiz in Training

Re: data grabbing from html sites

 
0
  #4
Oct 9th, 2005
could you give me a url of a sight you are trying to get data from. and explain what kind of data you need to find. Are trying to get certain urls, that link to other sights.
In a perfect world exceptions would not be needed.
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 138
Reputation: a1eio is an unknown quantity at this point 
Solved Threads: 21
a1eio's Avatar
a1eio a1eio is offline Offline
Junior Poster

Re: data grabbing from html sites

 
0
  #5
Oct 10th, 2005
it's a game,
i run out of ideas and frequently ask my friends for things to code this time he said that there is a game site he plays (i now play it ) but the layout is rubbish so he wanted me to code something that grabs the data from certain pages then put it in a tkinter gui style table so it's easier to work out numbers and how many of this you need to train and how many of that you need to buy, ect, ect
*back to the site*
the site only shows this 'end of turn' page (and other's) if it is still 'connected' to the side bar at the left hand side, if it's not, it redirects you to an error page. So... i want to know how to trick a webpage into thinking it is being viewed how it should (with it's main page and toolbar down the side), instead of being opened on it's own or by a program.
it's hard to explain.
and if you want to view the site you would have to register ect ect
unless of course your interested in text based, resource handling style web games

*EDIT: www.aarcsoft.com, then click on 'games' then click on the top game, then choose the server you want to connect to, then register and play, simple as that really
Reply With Quote Quick reply to this message  
Join Date: May 2005
Posts: 215
Reputation: shanenin is an unknown quantity at this point 
Solved Threads: 16
shanenin shanenin is offline Offline
Posting Whiz in Training

Re: data grabbing from html sites

 
0
  #6
Oct 10th, 2005
thanks for the nice explanation. I don't have any ideas, but maybe someone else will.
In a perfect world exceptions would not be needed.
Reply With Quote Quick reply to this message  
Join Date: Jul 2006
Posts: 6
Reputation: metabo_man is an unknown quantity at this point 
Solved Threads: 0
metabo_man metabo_man is offline Offline
Newbie Poster

Re: data grabbing from html sites

 
0
  #7
Jul 23rd, 2006
hello all

Originally Posted by shanenin
thanks for the nice explanation. I don't have any ideas, but maybe someone else will.


hi all,

same thing - same problem well i guess that we have the same interests, i also want to grab some data out of a
exisitng site - a forum.

first of - i have to explain something; I have to grab some data out of a phpBB in order to do some field reseach. I need the data out of a forum that is runned by a user community. I need the data to analyze the discussions.
Nothing harmeful - nothing bad - nothing serious and dangerous. But the issue is. i have to get the data - so what?
well we can think of some automation that runs with WWW :: Mechanize through the forums and gets all the data

http://search.cpan.org/search?query=...FHTTP&mode=all
no - i need some individual threads - to analyze them - (ABout 400 to 600 threads )

some examples
http://www.phpbb.com/phpBB/viewtopic.php?t=415990
http://www.phpbb.com/phpBB/viewtopic.php?t=415980
http://www.phpbb.com/phpBB/viewtopic.php?t=415970

btw these are only examples - not out of the real forum that is out of interest.
I need the data in a allmost full and complete formate. So i need all the data like
username .-
forum
thread
topic
text of the posting and so on and so on.
how to do that?
i need some kind of a grabbing tool - can i do it with that kind of tool. How do i sove the storing-issue into the local mysql-database.
Well you see that is a tricky work - and i am pretty sure taht i am getting help here. So for any and all help i am very very thankful

many many thanks in advance

Ethno-reseracher



btw: for the automation I suggest looking at WWW::Mechanize as it encapsulates many of the lower-level web automation tools provided by perl. By the way - we *will not* find better web automation tools in any language. The LWP/HTTP suite of modules are extremely powerful.
Reply With Quote Quick reply to this message  
Join Date: Jul 2005
Posts: 1,221
Reputation: bumsfeld will become famous soon enough bumsfeld will become famous soon enough 
Solved Threads: 137
bumsfeld's Avatar
bumsfeld bumsfeld is offline Offline
Nearly a Posting Virtuoso

Re: data grabbing from html sites

 
0
  #8
Jul 24th, 2006
"Beautiful Soup" is an HTML/XML parser for Python that can turn even poorly written markup code into a parse tree, so you can extract information.

Download the free program and documentation from:
http://www.crummy.com/software/BeautifulSoup/
Reply With Quote Quick reply to this message  
Join Date: Jul 2006
Posts: 6
Reputation: metabo_man is an unknown quantity at this point 
Solved Threads: 0
metabo_man metabo_man is offline Offline
Newbie Poster

Re: data grabbing from html sites

 
0
  #9
Jul 24th, 2006
hello many many thanks


Originally Posted by bumsfeld
"Beautiful Soup" is an HTML/XML parser for Python that can turn even poorly written markup code into a parse tree, so you can extract information.

Download the free program and documentation from:
http://www.crummy.com/software/BeautifulSoup/
guessing that this can help me.

well i look forward to learn more about it.

thanks in advande
meta
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC