Hi.
With using file_get_contents() I can get all html tags and text of a web page. But is there any way to get that web page's css codes too?

Member Avatar

diafol

Use DOMDocument and loop through <link> tags. With the fragmentation of styling rules (CSS, inline styling, style tags, js-delivered styling) it may be more difficult than you may think.

Can you give me an example please if it possible?

Member Avatar

diafol

Have a look at the php manual: DOMDocument

another option that will give you a total control is php html DOM parser API.

for example,

$scrape = file_get_html('somesite_to_scrape.com');
/* get the css line */

foreach($scrape->find('link.href') as $css_file){
        /* this is the css file location */
        $css_file;

        }

you can also do it with javascript..

I forgot to give you the download link.

Also, I would like to remind you about the security implications of file_get_contents as described in the php security consortium. So, instead of using this function, it is highly recommended to use cURL and then the DOM HTML parser API.

Example of cURL option

function getContents($site_url) {

    /* first off, the curl initialization */
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $site_url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);       

    /* execute the curl */
    $site_data = curl_exec($ch);

    /* anything that you open must be close immediately */
    curl_close($ch);

    return $site_data;

}

assuming you have downloaded the DOM parser above, we can use it like this

include_once('simple_html_dom.php');

$scrape = file_get_html(getContents('some_remote_site.com'));

/* do as my first response example above */

if you don't want to selectively parse the html tags, then parsing is simple as this

echo (getContents('some_remote_site.com'));
Member Avatar

diafol

BTW. DO make sure that you're using this in a legit way. Scraping has had a bad name due to people just ripping off other people's work.

if you don't want to use dom api that is the best choice
your content is a string parse it to find the css href and then load it again with file_get_contents

Who knows why file_get_html() doesn't work for me, any idea?

@lorenzoDAlipio, i have downloaded php html DOM parser API zip folder, what should i do know with it? Is there any special file needed to be placed in the localhost? (Sorry i need help for that, I don't know how to use it, thank you.)

Member Avatar

diafol

i have downloaded php html DOM parser API zip folder

Have you installed it into a folder of your choice and included it in the file where you wish to use it?

Extract the file: simple_html_dom.php to where you want e.g. 'vendors'
Include it in your file using a relative reference (or absolute if you really must):

 require '../vendors/simple_html_dom.php';

Would be right if you had the following structure:

vendors
    - simple_html_dom.php
public_html
    - index.php (in this file)

Am sure the next thing is "I need the images and downloads in all links too". Why not use a tool like wget, you can call it if you have execute rights on your system, and it will download anything it can find, you specify how deep it should go.

Thank you @diafol, it works now. And thank you @lorenzoDAlipio for the link.
And thank you all friends.

But I still can't get the css tags from a web page?
Could you help me more witn more clear explanation please?!

Member Avatar

diafol

require '../vendors/simple_html_dom.php';
$html = file_get_html($theUrlOfYourChoice);

foreach($html->find('link') as $element) 
       echo $element->plaintext . '<br />';

I think (I don't use it). $element may be an object rather than a string, I really don't know.