5
Contributors
15
Replies
87
Views
2 Years
Discussion Span
Last Post by diafol
Featured Replies
  • 1
    diafol 3,720   2 Years Ago

    >i have downloaded php html DOM parser API zip folder Have you installed it into a folder of your choice and included it in the file where you wish to use it? Extract the file: `simple_html_dom.php` to where you want e.g. 'vendors' Include it in your file using a relative … Read More

  • 1

    Am sure the next thing is "I need the images and downloads in all links too". Why not use a tool like `wget`, you can call it if you have execute rights on your system, and it will download anything it can find, you specify how deep it should go. Read More

0

Use DOMDocument and loop through <link> tags. With the fragmentation of styling rules (CSS, inline styling, style tags, js-delivered styling) it may be more difficult than you may think.

0

Can you give me an example please if it possible?

Edited by Niloofar24

0

Have a look at the php manual: DOMDocument

1

another option that will give you a total control is php html DOM parser API.

for example,

$scrape = file_get_html('somesite_to_scrape.com');
/* get the css line */

foreach($scrape->find('link.href') as $css_file){
        /* this is the css file location */
        $css_file;

        }

you can also do it with javascript..

0

I forgot to give you the download link.

Also, I would like to remind you about the security implications of file_get_contents as described in the php security consortium. So, instead of using this function, it is highly recommended to use cURL and then the DOM HTML parser API.

Example of cURL option

function getContents($site_url) {

    /* first off, the curl initialization */
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $site_url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);       

    /* execute the curl */
    $site_data = curl_exec($ch);

    /* anything that you open must be close immediately */
    curl_close($ch);

    return $site_data;

}

assuming you have downloaded the DOM parser above, we can use it like this

include_once('simple_html_dom.php');

$scrape = file_get_html(getContents('some_remote_site.com'));

/* do as my first response example above */

if you don't want to selectively parse the html tags, then parsing is simple as this

echo (getContents('some_remote_site.com'));

Edited by lorenzoDAlipio: edited indentations

0

BTW. DO make sure that you're using this in a legit way. Scraping has had a bad name due to people just ripping off other people's work.

0

if you don't want to use dom api that is the best choice
your content is a string parse it to find the css href and then load it again with file_get_contents

0

Who knows why file_get_html() doesn't work for me, any idea?

@lorenzoDAlipio, i have downloaded php html DOM parser API zip folder, what should i do know with it? Is there any special file needed to be placed in the localhost? (Sorry i need help for that, I don't know how to use it, thank you.)

1

i have downloaded php html DOM parser API zip folder

Have you installed it into a folder of your choice and included it in the file where you wish to use it?

Extract the file: simple_html_dom.php to where you want e.g. 'vendors'
Include it in your file using a relative reference (or absolute if you really must):

 require '../vendors/simple_html_dom.php';

Would be right if you had the following structure:

vendors
    - simple_html_dom.php
public_html
    - index.php (in this file)
1

Am sure the next thing is "I need the images and downloads in all links too". Why not use a tool like wget, you can call it if you have execute rights on your system, and it will download anything it can find, you specify how deep it should go.

0

Thank you @diafol, it works now. And thank you @lorenzoDAlipio for the link.
And thank you all friends.

But I still can't get the css tags from a web page?
Could you help me more witn more clear explanation please?!

0
require '../vendors/simple_html_dom.php';
$html = file_get_html($theUrlOfYourChoice);

foreach($html->find('link') as $element) 
       echo $element->plaintext . '<br />';

I think (I don't use it). $element may be an object rather than a string, I really don't know.

Edited by diafol

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.