[Urgent]
Hi ,
I am trying to extract text from a website using file_get_contents('url'),no issues with that.
Thing is that I am not getting expected output of it..And the reason I see is that ,if the url is generally opened,first home page flashes and while the page is still loading another/final set of information is displayed. So,final information is only displayed once the page is finally loaded.While the file_get_contents() function returns initial text/content of the website...
What all can be done,such that file_get_contents('url') extracts data only and ONLY after page stops loading and is fully loaded. I think a function like onpageload etc does not exist in PHP,not sure.. Please guide me with tips/code snippets/links etc .. Any help will be greatly appreciated.
In short,I want the content of website,only after its fully loaded and want the file_get_contents('url') execute only after 'url' is fully loaded. Thanks
Thanks a lot.Please let me know,if any part of the problem statement remains unclear.Thanks again..

If you stick the file_g..... function at the top of your script in a variable then it should be loading the code before you echo it out.

The issue comes as your are essentially loading two websites.

Maybe this website has a longer response or load time.

Only other thing is you could add a sleep statement in order for the named function to ctach up.

P.S: I do know the function just don't like to keep typing it, gets a little repetitive. :P

hey Josh,
I guess,I already did that,still no luck :(

$html_in_site = file_get_contents('url');
sleep(10);
echo $html_in_site;

hey Josh,
I guess,I already did that,still no luck :(

$html_in_site = file_get_contents('url');
sleep(10);
echo $html_in_site;

file_get_contents() I don't think has the ability to run javascript, I think, but I may be wrong. I would look into a javascript option and once the data is loaded into your html container, collect it and post it back to your php page through ajax or a form post. I'm cannot be sure of the specifics until I get into it but, that is where I would start.

If you think about it, the process of including all external javascript files and any external ajax requests as well as the javascript engine itself is kind of a complicated thing and browsers are already designed to do it, so a client side solution is definitely the way I would go.

If you stick the file_g..... function at the top of your script in a variable then it should be loading the code before you echo it out.

The issue comes as your are essentially loading two websites.

The file_get_contents function halts the execution of the PHP script until after the page it is loading is fully received.
Adding a sleep call after that will only delay the thread a little longer, after the external page has been loaded.

So, no. That is not the issue.

The issue is most likely client generated content, like Baldy guessed.
The file_get_contents only fetches the raw HTML response from the client, but it does not execute any of the scripts, and it does not load any external resources from that page.

Thus, when you get a page heavy with client-side scripts, it may look nothing like what you see in a browser.

In that case URL include. That works as it loads all of the scripts before the the document is loaded INCLUDING client side scripts.

BEFORE everyone says they cant do that or whatever because their host has barred it, then shhhhh.

The host do not EVER ban anything except from PHP executing commands (or a command that can run a server script, such as a linkux .run file or a windows .bat file) all of the disallowed and limited PHP values can be altered by simply adding a file called php.ini in the directory the script is working out of.

In that case URL include. That works as it loads all of the scripts before the the document is loaded INCLUDING client side scripts.

What do you mean by that?
If you are talking about the include construct, then no, not really.
It acts pretty much exactly like the file_get_contents function, in that it just gets the raw HTML and puts it into a variable or into the output buffer.

Sure, if the HTML includes JavaScript, then your browser will execute it when it reaches you, but from PHP's perspective, it's just text.
And any external scripts will not be loaded, neither by PHP nor your browser. (Unless the scripts use absolute URIs, of course. Then your browser will load them.)

No command in PHP I know of (and I know a lot of them ;)) will be able to load an URL and execute the client-side content before returning it to you.
You need a browser for that. (Which you could technically do, but it will be a lot more complex than calling a single funcction.)

Best method, if you want to display an external page inside your page, is to use a HTML <iframe>.
Not an ideal solution, I know, but by far the simplest one.

I always thought include took all of the docs and whatever before it is echoed out, maybe I was wrong :(

It only takes the raw response (usually HTML) and returns it as text.

Keep in mind that external resources, such as images and JavaScript files, require an additional request to fetch, so when your browser enters a page it is actually doing multiple requests to the server, not just a single one, like include does.

You could of course parse the response and do additional requests for these resources manually. Although you would have to pretty much write a new browser to display it all correctly :)

Thanks for discussion guys.Please see this url,where I need help -> http://data.giub.uni-bonn.de/openrouteservice/index.php?start=7.0892567,50.7265543&end=7.0986258,50.7323634&pref=Fastest&lang=de
You will notice that initially on the left pane,there is text "First Europe wide " in the lower half of left pane,and after a second or 2 a Routing description shows up, starting with "Route-Instruction" etc.Now,I want to grab this Route-Instruction and data/text below it..
Is this Ajax or something,Please guide/help ,,as in how can I grab this piece of text ..
Thanks a lot in advance.
Regards,
G

Thanks for discussion guys.Please see this url,where I need help -> http://data.giub.uni-bonn.de/openrouteservice/index.php?start=7.0892567,50.7265543&end=7.0986258,50.7323634&pref=Fastest&lang=de
You will notice that initially on the left pane,there is text "First Europe wide " in the lower half of left pane,and after a second or 2 a Routing description shows up, starting with "Route-Instruction" etc.Now,I want to grab this Route-Instruction and data/text below it..
Is this Ajax or something,Please guide/help ,,as in how can I grab this piece of text ..
Thanks a lot in advance.
Regards,
G

After looking at the code of the site I can see that this is an ajax call. After working with this a little I have found that you cannot retrieve this information client side because of a security violation since this is on another domain. Also, in order to do it server side you will need a javascript engine to process this but again, it is an ajax call which would also be a cross domain security violation. If you absolutely need this information, you can get the name of the server side file that the ajax call is referencing along with all the required post variables and post to it using curl, that is, if the server and the script will allow external posts.

My recommendation: look into google geocoding, it will be much easier, more robust and more reliable.

Agree with Baldy there.

You could technically mimic the AJAX call made by the actual site using cUrl, given that the host doesn't block it.
But you would be relying on them to keep the code on their site as it is. A minor change made by them to their site might very well break your site.

Not to mention whatever copyright laws you might be violating, using their content in such a way.

You will be much better of implementing this for yourself.

This article has been dead for over six months. Start a new discussion instead.