0

Is there a way to load a page for a minute and then fetch content of a page.

I am using

new DOMDocument

method to fetch content from a page which is working fine. Problem is that i want to load a page for a minute because the page from which i am fetching data , shows result after 30 seconds and PHP DOMDocument fetch data before the whole page loads.
So is there a way

4
Contributors
11
Replies
38
Views
3 Years
Discussion Span
Last Post by pritaeas
0

But i check it in source code, data is accessable to me and i can fetch it. All i need is to load a page for a minute and the fetch it. Is there a way

0

But i check it in source code, data is accessable to me and i can fetch it

If you already have the source code and the data, what do you need changed? Am confused.

0

source code via browser, not the real source code. But anyway tell me how can i run file_get_content() for a minute. Just tell me this forget everything.

0

My original reply still stands. If that data is being retrieved via Javascript, you can't get at it, even if you wait for a minute.

0

OK, here's a demo - vanilla js replaced content works if the page is echoed, but as pritaeas states, ajaxed content does not. However, the DOMDocument will not pick up on javascript replacement.

demos.diafol.org/scrape/... files

jsreplace.php

<!DOCTYPE HTML>
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<p>Original content</p>
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></script>
<script>$('p').html('JS content');</script>
</body>
</html>

ajaxreplace.php

<!DOCTYPE HTML>
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<p>Original content</p>

<script src="//ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></script>
<script>
    var ret = $.get('ajax.php',function(data)
    {
        $('p').html(data);  
    });

</script>
</body>
</html>

ajax.php

Ajaxed content

So you can try to scrape thus...

<?php echo file_get_contents('http://demos.diafol.org/scrape/jsreplace.php');?>

you should see 'JS content' - you may even see the replacement taking place with a quick flash of the 'Original content' text.

<?php echo file_get_contents('http://demos.diafol.org/scrape/ajaxreplace.php');?>

you should see 'Original content'

But the apparent 'success' from the JS replacement is just the JS running after the fact.

DOMDocument cannot foresee this replacement.

<?php
        $baseUrl = 'http://demos.diafol.org/scrape/jsreplace.php';
        $domDoc = new DOMDocument();
        $domDoc->strictErrorChecking = false;
        $domDoc->recover=true;
        @$domDoc->loadHTMLFile($baseUrl);

        $ps = $domDoc->getElementsByTagName('p');
        foreach($ps as $p)
            echo $p->nodeValue;
?>

You should see 'Original content'

Hope that clears it up.

Edited by diafol

0

In my experience, it's more effective scraping content (especially when it's complex) using a local program rather than PHP. This gives you the same vantage point as a user sitting in front of a browser so you see everything regardless of what technology was used to put it there. I have used Autoit for this because it has a good (COM) interface to the IE services built into Windows. When I need to use that data online, I have the local program upload the data to a custom PHP program that stores it in a DB. If this needs to be done on a scheduled basis you can set it up in the Windows scheduler so as long as your machine is on, it can be done hands-off and the data will be available for use online. Not a great solution if it needs to be done on-demand from multiple end points.

0

but local software saves data once or we have to run it again and again. I have a site which fetches data live from other site. I just want to load a page for couple of second and then fetch its data. file_get_contents() dont wait , it just fetches the data from page but i want to make it wait and fetch data after ajax is also loaded

0

So what's the solution

There isn't one.

You can only do this when building a desktop application which uses a webbrowser control.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.