PHP HTTP Screen-Scraping Class with Caching

Reply

Join Date: Jan 2008
Posts: 1
Reputation: chocoholic is an unknown quantity at this point 
Solved Threads: 0
chocoholic chocoholic is offline Offline
Newbie Poster

Re: PHP HTTP Screen-Scraping Class with Caching

 
0
  #11
Jan 4th, 2008
when i used your examples, it all worked well, but when i tried using a different url, i got the error:

Warning: Missing argument 1 for http::getFromUrl(), called in /home/web/class_http.php on line 90 and defined in /home/web/class_http.php on line 139

also, before, when i was using your example, it didn't appear to be caching. when i looked in the current directory or when i specified a different directory, i did not see any cached data.

thanks
Reply With Quote Quick reply to this message  
Join Date: Jan 2008
Posts: 2
Reputation: savoo4u is an unknown quantity at this point 
Solved Threads: 0
savoo4u savoo4u is offline Offline
Newbie Poster

Re: PHP HTTP Screen-Scraping Class with Caching

 
0
  #12
Jan 8th, 2008
by using
echo "<pre>";
print_r($msft_stats);
echo "</pre>";
or
/*
Static method table_into_xml()
to parse the elements from allgame.com
*/
function table_into_xml($rawHTML,$needle="",$needle_within=0,$allowedTags="") {
if (!$aryTable = http::table_into_array($rawHTML,$needle,$needle_within,$allowedTags)) { return false; }
$xml = "<?xml version=\"1.0\" standalone=\"yes\" \?\>\n";
$xml .= "<TABLE>\n";
$rowIdx = 0;
foreach ($aryTable as $row) {
$xml .= "\t<ROW id=\"".$rowIdx."\">\n";
$colIdx = 0;
foreach ($row as $col) {
$xml .= "\t\t<COL id=\"".$colIdx."\">".trim(utf8_encode(htmlspecialchars($col)))."</COL>\n";
$colIdx++;
}
$xml .= "\t</ROW>\n";
$rowIdx++;
}
$xml .= "</TABLE>";
return $xml;
}
}
In which location is the XML file created or were can we see the array
thank you
Reply With Quote Quick reply to this message  
Join Date: Apr 2004
Posts: 59
Reputation: world_weapon is an unknown quantity at this point 
Solved Threads: 2
world_weapon's Avatar
world_weapon world_weapon is offline Offline
Junior Poster in Training

Re: PHP HTTP Screen-Scraping Class with Caching

 
0
  #13
Jan 20th, 2008
Hey Troy, if you check up on this, I was wondering, I come across a 302 status on one of the pages I try to scrape. I use the URLs the way they show up in the browser. Of course in the header, there is also a LoginRedir={stuffgoeshere}. I haven't been able to find much on LoginRedir on google. I was wondering though, how does your class handle 302 status? What I probably mean is does it follow through the redirect? I assume it doesn't because the content of body is not what I expect, something along the lines of object moved. Maybe I got something set improperly? The LoginRedir=XXXX is in the Set-Cookie: so I think it may also be with the way I have the script handle the cookie after receiving this. It could be the data stored in the cookie, or it could be the class not following through on the 302. I will try to play with this some more. Let me know what you think.
The purpose of my existence is why I am here.
Reply With Quote Quick reply to this message  
Join Date: Apr 2004
Posts: 59
Reputation: world_weapon is an unknown quantity at this point 
Solved Threads: 2
world_weapon's Avatar
world_weapon world_weapon is offline Offline
Junior Poster in Training

Re: PHP HTTP Screen-Scraping Class with Caching

 
0
  #14
Jan 20th, 2008
Just to let you know, I successfully did it with a cURL implementation, but I would still like to figure out how your class handles 302 status. cURL allows boolean setting to follow through on redirects. Well, I will play with it some more. Would love to have a non- cURL implementation.
The purpose of my existence is why I am here.
Reply With Quote Quick reply to this message  
Join Date: Feb 2008
Posts: 41
Reputation: asadalim1 is an unknown quantity at this point 
Solved Threads: 0
asadalim1 asadalim1 is offline Offline
Light Poster

Re: PHP HTTP Screen-Scraping Class with Caching

 
0
  #15
Dec 12th, 2008
Im trying to run the example but encounter this problem.

Warning: Missing argument 1 for http::getFromUrl(), called in C:\wamp\www\scrape\troy\class_http.php on line 88 and defined in C:\wamp\www\scrape\troy\class_http.php on line 137

any ideas?

cheers
Reply With Quote Quick reply to this message  
Join Date: Feb 2008
Posts: 41
Reputation: asadalim1 is an unknown quantity at this point 
Solved Threads: 0
asadalim1 asadalim1 is offline Offline
Light Poster

Re: PHP HTTP Screen-Scraping Class with Caching

 
0
  #16
Dec 15th, 2008
Has this class worked succesfully for anybody?
Reply With Quote Quick reply to this message  
Join Date: Jan 2009
Posts: 2
Reputation: WebSnail is an unknown quantity at this point 
Solved Threads: 0
WebSnail WebSnail is offline Offline
Newbie Poster

Re: PHP HTTP Screen-Scraping Class with Caching

 
0
  #17
Jan 8th, 2009
Originally Posted by asadalim1 View Post
Has this class worked succesfully for anybody?
I've just loaded it up on a *nix test site and it works pretty well.

It's literally just a class to grab the content, everything else is down to the Coder to parse using reg-ex, etc... (Oh the joy! *sob*).

I noticed a couple of initial impressions you get though.

1. The example script has two sites on it that you need to disable (comment out) before you run it (see the end of the code: lines 130 on->)

2. The whole table_into_array() thing uses an old title. Change this:
  1. $msft_stats = http::table_into_array($h->body, "Avg Daily Volume", 1, null);
to this..
  1. $msft_stats = http::table_into_array($h->body, "Avg. Daily Vol.", 1, null);
.. and you're laughing.


Anyhoo... this is useful for a project I've started looking at so here's hoping it stays the course.
Last edited by WebSnail; Jan 8th, 2009 at 1:12 pm.
Reply With Quote Quick reply to this message  
Join Date: Jan 2009
Posts: 2
Reputation: WebSnail is an unknown quantity at this point 
Solved Threads: 0
WebSnail WebSnail is offline Offline
Newbie Poster

Re: PHP HTTP Screen-Scraping Class with Caching

 
0
  #18
Jan 8th, 2009
Realised there was a syntax error in the image_cache.php as well..
Find:
  1. $h->fetch($_GET['url'], $_GET['ttl'];);

Replace with:
  1. $h->fetch($_GET['url'], $_GET['ttl']);
(or just delete the extra semi colon)
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC