Hello Everyone,

I am using a Javascript Library called Turn.js and basically, I am building an ebook reader, now the problem is that I need to load external pages, these pages have their own animation and css layouts per page, plus the image is made for an iPad so vertically on a normal PC screen it's huge. I have access to these pages, however it's the PHP that's reading them. The only way I have been able to get it working is to use an iFrame to keep the animation and CSS. I cannot zoom out of the iframe so the iframe has to be massiv on the page, so my question is how can I display the contents of the file without having to use an iframe and without the file_get_contents from messing up the page it is being displayed on?

Thanks, much appreciated.

DJ

Recommended Answers

All 12 Replies

Member Avatar for diafol

I'm assuming that these pages have html markup like 'html, head, body' in them. If so, that's bad. Alternative methods may include using file_get_contents or cURL.

Hi, that's exactly the problem diafol, thanks for your reply. I like how there maybe alternatives, sorry but could you possibly guide me to an example or help?

Thanks a lot

DJ

Member Avatar for diafol

My screen scraping days are behind me - haven't done it for years, so I'm not sure how to advise. Do all these remote pages have different css files etc or do they all have the same one?

They all have different ones, aswell as different Javascript files..

Thanks.

If you want to keep it all working, you'd have to download everything (be careful with legal issues).

Well all the files are on the same server but I don't get what you mean by downloading and why would I do that as each page on it's own works and looks great apart from the size, however, obviously I want those xhtml pages in each page inside the turn.js control...?

Thanks

The problem is, that if you want to show a HTML file from a different location in a div, you cannot just copy the content, and put that in. All headers would have to be removed, and any content that is linked, will have to put in too. That will require a lot of parsing, not guaranteeing you'll get it right.

Yes, I was wondering if there was a way to maybe just extract text and images only from a file with PHP?

Thanks

That's possible. If the file is valid XHTML then using an XML parser is the best option. Assuming all images references are fully qualified URL's.

Member Avatar for diafol

If you can't get iframes to work as you want, then use curl / file_get_contents / xml loadfile script to get the content.
Your parser / parsing routine will need to extract ALL script/css filenames and script/style contents from the remote file head area. You should also remember to search for script or style tags outside the head area. Placing script tags at the bottom of the page has become de rigeur to avoid using document.ready workarounds, so you must check for these too.
Following on from this, you may find inserting remote code into your 'parent page' will create class/id/name conflicts with other js/css files/contents.
You can hotlink to javascript, css and inline and style tag image files by converting src/href attributes to have the remote url prefix, e.g. change "/images/img1.gif" to "http://www.example.com/images/img1.gif".
A problem may arise if you have references to external files in js or css files themselves - v.difficult to change these dynamically. Perhaps, as Pritaeas suggests, you need to download the files (beware copyright). This is not the best way as these files may become obsolete if the remote page is updated or the remote file script or css is changed.

I can envisage this being a bit of a nightmare if all your remote files are different.
Maybe use the iframe after all and use js to resize width/height?

Thanks Diafol,

I have already done what you said, using a PHP page to get the contents of the xhtml pages and then I used the iframe to show the end result, the reason was so I could change the paths to images, javascript and Css files.

I could use an iFrame, IF i could get a way of using some sort of zoom, I have tried the CSS zoom: and that kind of does the job in Chrome, it zooms out the full page aswell as the javascript animation, however, in IE the javascript animation images are not effected and remain too big.

I have also tried a javascript libaray which shurnk down the size of the content to the iframe, but again, IE didn't like it and it messed up the CSS and Javascript animations on the xhtml pages..

I'm really in a pickle at the minute with this and I appreciate all the help you guys have given, but I'm still looking for a solution.... :/

Hi,

Try, copy, save as anyNameYouWant.php, upload to your server, and direct your browser to this file

<?php phpinfo(); ?>

On this page, look for Server API value. Like most servers, I am hoping for your server to have a value of fast CGI or anything bearing the acronym CGI. If your server API say Apache module, you need to ask your host to put this entry on the loaded configuration file which is the php.ini file.

Option ONe: php.ini file trick EASY and RISKY
Assuming that it is indeed a fast CGI, then you we can easily add a new php.ini file in the directory needing it. For example, if your script is running in YourDomainDotCom/YourScriptDirectory, then the new php.ini file should be uploaded in the YourScriptDirectory and NOT inside the root directory. However, if the scraping script is included by other files, the php.ini file should be within the location of the file calling the scraping script.

Risks?
By setting allow_url_include = On, I want you to understand the security risks. For example, yourDomainDotCom?file=someBadDomainDotCom/someBadScript.php?exe=defaceThisSite. Although this type of vulnerability can be implemented by advance users only, and as long as you keep it in low profile your site is less likely to be a target of an attack.

Here we go. If you are sure that the server API is CGI or its derivatives, then you can copy codes below, save as php.ini file, and upload to the directory as I have already explained above.

allow_url_include = On;  

Option Two: Hard but Safer .
Say, your server API is not of CGI derivatives, your host does not like the security risk envolve, and they came back at you with hard denial for your request of php.ini file entry adjustments, then you can ask them if cURL extensions are installed on your server. Assuming that your server have the cURL enabled, we can still use this script.

function useCurl($remoteFileUrl){
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)');  
        curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,10);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_AUTOREFERER, true);
        curl_setopt($ch, CURLOPT_URL, $remoteFileUrl);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $output = curl_exec($ch);

        return $output;
        curl_close($ch); 
        }

        ## we can use it like this..you can shorten the codes below, I wrote them like that for clarification purposes only.

        $fileUrl = "SomeDomainDotCom/theFile.php";
        $grabThisFile = useCurl($fileUrl);

        ## use your scraper function

        $scrapeThis = yourScraperFunction($grabThisFile);

        foreach($scrapeThis as $item){

        ## do anything with the items here

        }

        ## to clean up a little just unset the $item.
        unset($item);

I hope this could help. Otherwise, I hope it may give you some simple ideas. You can also use var_dump($output) to test if the cURL is grabbing the page you want to scrape..

good luck..

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.