Folks,

This is absurd!
As you know, some crawler codes on the internet exist where you get it to navigate to a page and it extracts all html links. hrefs.
Code such as this one:

//Sitemap Protocol: https://www.sitemaps.org/protocol.html

include_once('simplehtmldom_1_9_1/simple_html_dom.php');

//WORKS.
//$sitemap = 'https://www.rocktherankings.com/post-sitemap.xml';
//$sitemap = "https://www.rocktherankings.com/sitemap_index.xml"; //Has more xml files.

//FAILS. Shows blank page.
$sitemap = "https://bytenota.com/sitemap.xml";

$html = new simple_html_dom();
$html->load_file($sitemap);

foreach($html->find("loc") as $link)
{
    echo $link->innertext."<br>";
}

And there are those that extract links from xml files.
Like this one:

//Sitemap Crawler: If starting url is an xml file listing further xml files then it will show blank page and not visit the found xml files to extract links from them.
//Sitemap Protocol: https://www.sitemaps.org/protocol.html

// sitemap url or sitemap file
//FAILS.
//$sitemap = "https://www.rocktherankings.com/sitemap_index.xml"; //Has more xml files.
//WORKS
//$sitemap = "https://bytenota.com/sitemap.xml";
//$sitemap = 'https://www.rocktherankings.com/post-sitemap.xml';

// get sitemap content
$content = file_get_contents($sitemap);

// parse the sitemap content to object
$xml = simplexml_load_string($content);

// retrieve properties from the sitemap object
foreach ($xml->url as $urlElement) 
{
    // get properties
    $url = $urlElement->loc;
    $lastmod = $urlElement->lastmod;
    $changefreq = $urlElement->changefreq;
    $priority = $urlElement->priority;

    // print out the properties
    echo 'url: '. $url . '<br>';
    echo 'lastmod: '. $lastmod . '<br>';
    echo 'changefreq: '. $changefreq . '<br>';
    echo 'priority: '. $priority . '<br>';

    echo '<br>---<br>';
}

But guess what ?
Both these do not work if you get the crawlers to navigate to an xml file sitemap that lists further xml links or sitemaps.
And so, I am trying to build my own crawler, where when I set it to navigate to an xml sitemap then it should check if the listed links are href links or further xml links to more xml sitemaps.
So what I did was, I first got my crawler to navigate to an xml file.
And now I want it to extract all found links and check whether they found links are hrefs or further xml links.
If the links are hrefs, then add them to the $extracted_urls array.
Else add them to the $crawl_xml_files array.
So later on, the crawler can crawl those extracted href & xml links.
Now, I am stuck on the part where, the code fails to echo the link extensions of the found links on the initially navigated page.
It fails to extract any links to the respective arrays.
Here is the code. Test it and see for yourself where I am going wrong. I am scratching my head.

My UNWORKING CODE

//Sitemap Crawler: If starting url is an xml file listing further xml files then it will show blank page and not visit the found xml files to extract links from them.
//Sitemap Protocol: https://www.sitemaps.org/protocol.html

    //$sitemap = 'https://www.rocktherankings.com/post-sitemap.xml';
    //$sitemap = "https://www.rocktherankings.com/sitemap_index.xml"; //Has more xml files.
    $sitemap = 'https://bytenota.com/sitemap.xml';
    //$sitemap = 'https://www.daniweb.com/home-sitemap.xml';
    // get sitemap content
    //$sitemap = 'sitemap.xml';
    // get sitemap content
    $content = file_get_contents($sitemap);

    // parse the sitemap content to object
    $xml = simplexml_load_string($content);
    //var_dump($xml);
    // Init arrays
    $crawl_xml_files = [];
    $extracted_urls = [];
    $extracted_last_mods = [];
    $extracted_changefreqs = [];
    $extracted_priorities = [];
    // retrieve properties from the sitemap object
    foreach ($xml->url as $urlElement) {
        // provide path of curren xml/html file
        $path = (string)$urlElement->loc;
        // get pathinfo
        $ext = pathinfo($path, PATHINFO_EXTENSION);
        echo 'The extension is: ' . $ext;
        echo '<br>'; //DELETE IN DEV MODE

        echo $urlElement; //DELETE IN DEV MODE

        if ($ext == 'xml') //This means, the links found on the current page are not links to the site's webpages but links to further xml sitemaps. And so need the crawler to go another level deep to hunt for the site's html pages.
        {
            echo __LINE__;
            echo '<br>'; //DELETE IN DEV MODE

            //Add Xml Links to array.
            $crawl_xml_files[] = $path;
        } elseif ($ext == 'html' || $ext == 'htm' || $ext == 'shtml' || $ext == 'shtm' || $ext == 'php' || $ext == 'py') //This means, the links found on the current page are the site's html pages and are not not links to further xml sitemaps.
        {
            echo __LINE__;
            echo '<br>'; //DELETE IN DEV MODE

            //Add hrefs to array.
            //$extracted_urls[] = $path;

            // get properties

            $extracted_urls[] = $extracted_url = $urlElement->loc; //Add hrefs to array.
            $extracted_last_mods[] = $extracted_lastmod = $urlElement->lastmod; //Add lastmod to array.
            $extracted_changefreqs[] = $extracted_changefreq = $urlElement->changefreq; //Add changefreq to array.
            $extracted_priorities[] = $extracted_priority = $urlElement->priority; //Add priority to array.
        }
    }

    var_dump($crawl_xml_files); //Print all extracted Xml Links.
    var_dump($extracted_urls); //Print all extracted hrefs.
    var_dump($extracted_last_mods); //Print all extracted last mods.
    var_dump($extracted_changefreqs); //Print all extracted changefreqs.
    var_dump($extracted_priorities); //Print all extracted priorities.

    foreach($crawl_xml_files as $crawl_xml_file)
    {
        echo 'Xml File to crawl: ' .$crawl_xml_file; //Print all extracted Xml Links.
    }

    echo __LINE__; 
    echo '<br>'; //DELETE IN DEV MODE

    foreach($extracted_urls as $extracted_url)
    {
        echo 'Extracted Url: ' .$extracted_url; //Print all extracted hrefs.
    }

    echo __LINE__; 
    echo '<br>'; //DELETE IN DEV MODE

    foreach($extracted_last_mods as $extracted_last_mod)
    {
        echo 'Extracted last Mod: ' .$extracted_last_mod; //Print all extracted last mods.
    }

    echo __LINE__; 
    echo '<br>'; //DELETE IN DEV MODE

    foreach($extracted_changefreqs as $extracted_changefreq)
    {
        echo 'Extracted Change Frequency: ' .$extracted_changefreq; //Print all extracted changefreqs.
    }

    echo __LINE__; 
    echo '<br>'; //DELETE IN DEV MODE

    foreach($extracted_priorities as $extracted_priority)
    {
        echo 'Extracted Priority: ' .$extracted_priority; //Print all extracted priorities.
    }

    echo __LINE__; 
    echo '<br>'; //DELETE IN DEV MODE

How to fix this ?

I get this echoed ....

The extension is:
The extension is:
The extension is:
The extension is:
The extension is:
The extension is:
C:\wamp64\www\Work\buzz\Templates\crawler_Test.php:66:
array (size=0)
empty
C:\wamp64\www\Work\buzz\Templates\crawler_Test.php:67:
array (size=0)
empty
C:\wamp64\www\Work\buzz\Templates\crawler_Test.php:68:
array (size=0)
empty
C:\wamp64\www\Work\buzz\Templates\crawler_Test.php:69:
array (size=0)
empty
C:\wamp64\www\Work\buzz\Templates\crawler_Test.php:70:
array (size=0)
empty
77
85
93
101
109

Obviously, I get tonnes of lines of ...
The extension is:

Recommended Answers

All 12 Replies

If you echo the value of $path, what does it show?

Is it perhaps an URL instead of a path? Have a look at the parse_url function. You can use that to extract the path from an URL before you try to get the extension.

@pritaeas

Sorry! I missed your reply.
You are asking me what LINE 34 is echoing.

ECHO $path = (string)$urlElement->loc;          //LINE 34

I get echoed this on line 34:

https://bytenota.com

This is what I get echoed in full page:

**https://bytenota.com/
36
The extension is: com

https://bytenota.com/codeigniter-create-your-first-controller/
36
The extension is:

https://bytenota.com/learn-codeigniter-tutorials/
36
The extension is:

https://bytenota.com/codeigniter-creating-a-hello-world-application/
36
The extension is:

https://bytenota.com/codeigniter-4-how-to-remove-public-from-url/
36
The extension is:

https://bytenota.com/apache-ant-delete-all-files-in-a-directory-but-not-in-subdirectories/
36
The extension is:

https://bytenota.com/ruby-how-to-convert-all-folder-subfolders-files-to-lowercase/
36
The extension is:

https://bytenota.com/solved-typescript-error-property-x-has-no-initializer-and-is-not-definitely-assigned-in-the-constructor/
36
The extension is:

https://bytenota.com/sovled-typescript-error-object-is-possibly-null-or-undefined/
36
The extension is:

https://bytenota.com/php-get-different-days-between-two-days/
36
The extension is:

https://bytenota.com/php-getting-creation-date-last-modified-date-of-a-file/
36
The extension is:

https://bytenota.com/java-get-different-days-between-two-days/
36
The extension is:

https://bytenota.com/angular-creating-a-hello-world-application/
36
The extension is:

https://bytenota.com/upgrade-your-angular-cli-to-latest-version/
36
The extension is:

https://bytenota.com/how-to-access-localhost-asp-net-webapp-from-mobile-tablet-device/
36
The extension is:

https://bytenota.com/solved-invalidoperationexception-session-has-not-been-configured-for-this-application-or-request/
36
The extension is:

https://bytenota.com/bytenota-logo/
36
The extension is:

https://bytenota.com/angular-creating-a-hello-world-application/angular-welcome-page/#main
36
The extension is:

https://bytenota.com/contact/
36
The extension is:

https://bytenota.com/windows-enable-fips-140-compliant-algorithms/
36
The extension is:

https://bytenota.com/windows-enable-fips-140-compliant-algorithms/enablefips140/#main
36
The extension is:

https://bytenota.com/windows-enable-fips-140-compliant-algorithms/registryeditor1/#main
36
The extension is:

https://bytenota.com/ruby-how-to-convert-all-folder-subfolders-files-to-lowercase/ruby-icon/#main
36
The extension is:

https://bytenota.com/check-the-installed-version-of-a-npm-package/
36
The extension is:

https://bytenota.com/list-all-versions-of-an-npm-package/
36
The extension is:

https://bytenota.com/codeigniter-switching-between-development-and-production-mode/
36
The extension is:

https://bytenota.com/solved-composer-error-allowed-memory-size-of-1610612736-bytes-exhausted/
36
The extension is:

https://bytenota.com/solved-jshint-error-use-esversion-6-in-vs-code/
36
The extension is:

https://bytenota.com/sourcetree-refresh-remotes-branches-status/
36
The extension is:

https://bytenota.com/virtualbox-how-to-increase-decrease-size-of-vdi-disk-on-windows/
36
The extension is:

https://bytenota.com/javascript-detect-and-get-the-current-version-of-microsoft-edge/
36
The extension is:

https://bytenota.com/asp-net-core-getting-project-root-directory-path/
36
The extension is:

https://bytenota.com/asp-net-core-get-the-current-version-of-asp-net-core-mvc-using-reflection-approach/
36
The extension is:

https://bytenota.com/php-how-to-check-whether-or-not-a-string-contains-nonascii-chars/
36
The extension is:

https://bytenota.com/solved-renaming-failed-git-mv-permission-denied/
36
The extension is:

https://bytenota.com/symfony-4-creating-a-simple-hello-world-step-by-step/
36
The extension is:

https://bytenota.com/symfony-4-fosuserbundle-the-service-fos_user-resetting-controller-has-a-dependency-on-a-non-existent-service-templating/
36
The extension is:

https://bytenota.com/how-to-solve-netbeans-ides-projects-window-not-displaying/
36
The extension is:

https://bytenota.com/git-how-to-discard-all-local-changes-commits/
36
The extension is:

https://bytenota.com/solved-filename-too-long-git-error-in-windows/
36
The extension is:

https://bytenota.com/how-to-show-older-previous-revisions-in-tortoisesvn/
36
The extension is:

https://bytenota.com/solve-typeerror-cannot-read-property-get-of-undefined-error-handler-js/
36
The extension is:

https://bytenota.com/javascript-how-to-get-all-parameters-from-url/
36
The extension is:

https://bytenota.com/how-to-enable-sqlite3-on-windows-linux-mac-os-x/
36
The extension is:

https://bytenota.com/java-how-to-invoke-parent-class-methods-using-java-reflection/
36
The extension is:

https://bytenota.com/linux-list-of-most-useful-commands/
36
The extension is:

https://bytenota.com/windows-cmd-set-java_home-variable-using-command-prompt/
36
The extension is:

https://bytenota.com/java-check-if-url-contains-query-string-or-not/
36
The extension is:

https://bytenota.com/java-how-to-get-all-parameters-from-url-in-servlet/
36
The extension is:

https://bytenota.com/php-check-if-a-string-can-be-unserialized-or-not/
36
The extension is:

https://bytenota.com/serialize-and-unserialize-object-in-php-with-example/
36
The extension is:

https://bytenota.com/convert-php-object-to-from-json-string/
36
The extension is:

https://bytenota.com/jsp-how-to-get-httpservletrequest-in-jsp-custom-tag/
36
The extension is:

https://bytenota.com/jsf-how-to-get-httpservletrequest-in-jsf-component/
36
The extension is:

https://bytenota.com/wordpress-how-to-get-posts-by-category-id/
36
The extension is:

https://bytenota.com/wordpress-get-category-name-of-current-post/
36
The extension is:

https://bytenota.com/jquery-select-an-html-element-inside-an-iframe/
36
The extension is:

https://bytenota.com/php-save-an-image-file-from-base64-string/
36
The extension is:

https://bytenota.com/servlet-get-url-parameters-in-doget/
36
The extension is:

https://bytenota.com/display-base64-image-in-html-using-javascript-and-jquery/
36
The extension is:

https://bytenota.com/java-cloning-a-bufferedimage-object/
36
The extension is:

https://bytenota.com/java-set-default-value-for-enum-fields/
36
The extension is:

https://bytenota.com/python-replace-last-occurrence-of-a-string/
36
The extension is:

https://bytenota.com/wordpress-add-shortcode-in-a-php-template-page/
36
The extension is:

https://bytenota.com/java-get-url-parameters-in-jsp-page/
36
The extension is:

https://bytenota.com/python-creating-a-hello-world-program/
36
The extension is:

https://bytenota.com/html-css-make-a-link-element-disabled-with-css/
36
The extension is:

https://bytenota.com/gitlab-solving-the-project-you-were-looking-for-could-not-be-found/
36
The extension is:

https://bytenota.com/angular-cli-fixing-unknown-option-environment/
36
The extension is:

https://bytenota.com/make-div-element-readonly-with-css/
36
The extension is:

https://bytenota.com/php-convert-image-to-base64-string/
36
The extension is:

https://bytenota.com/javascript-get-local-timezone-of-client/
36
The extension is:

https://bytenota.com/php-get-list-of-all-timezone-ids/
36
The extension is:

https://bytenota.com/java-get-firefox-browser-version/
36
The extension is:

https://bytenota.com/java-detecting-firefox-browser/
36
The extension is:

https://bytenota.com/php-implement-startswith-and-endswith-functions/
36
The extension is:

https://bytenota.com/php-get-http-response-status-code-from-a-url/
36
The extension is:

https://bytenota.com/php-get-firefox-browser-version/
36
The extension is:

https://bytenota.com/javascript-check-if-url-contains-query-string/
36
The extension is:

https://bytenota.com/php-how-to-check-if-an-ip-address-is-private-or-not/
36
The extension is:

https://bytenota.com/parsing-an-xml-file-in-javascript/
36
The extension is:

https://bytenota.com/javascript-get-sha-256-hash-of-a-string/
36
The extension is:

https://bytenota.com/php-detecting-firefox-browser/
36
The extension is:

https://bytenota.com/javascript-get-firefox-browser-version/
36
The extension is:

https://bytenota.com/javascript-detecting-firefox-browser/
36
The extension is:

https://bytenota.com/angular-cli-build-application-in-production-mode/
36
The extension is:

https://bytenota.com/javascript-get-sha-1-hash-of-a-string/
36
The extension is:

https://bytenota.com/detect-ssl-https-using-php/
36
The extension is:

https://bytenota.com/detect-ssl-https-using-javascript/
36
The extension is:

https://bytenota.com/parsing-an-xml-sitemap-in-javascript/
36
The extension is:

https://bytenota.com/jquery-detect-if-a-checkbox-is-checked-or-unchecked/
36
The extension is:

https://bytenota.com/parsing-an-xml-sitemap-in-php/
36
The extension is:

https://bytenota.com/aspnet-get-ie-browser-version-using-csharp/
36
The extension is:

https://bytenota.com/javascript-save-objects-in-html5-session-storage/
36
The extension is:

https://bytenota.com/javascript-save-objects-in-html5-local-storage/
36
The extension is:

https://bytenota.com/javascript-how-to-store-data-in-html5-session-storage/
36
The extension is:

https://bytenota.com/javascript-how-to-store-data-in-html5-local-storage/
36
The extension is:

https://bytenota.com/how-to-write-debugging-output-to-log-file-in-php/
36
The extension is:

https://bytenota.com/fixing-cant-resolve-rxjs-add-operator-map-error/
36
The extension is:

https://bytenota.com/wordpress-get-post-thumbnail-alt/
36
The extension is:

https://bytenota.com/wordpress-get-post-thumbnail-caption/
36
The extension is:

https://bytenota.com/jquery-get-all-selected-checkboxes/
36
The extension is:

https://bytenota.com/java-how-to-check-if-an-ip-address-is-private-or-not/
36
The extension is:

https://bytenota.com/java-how-to-check-if-an-ip-address-is-ipv4-or-ipv6/
36
The extension is:

https://bytenota.com/javascript-deleting-a-property-from-an-object/
36
The extension is:

https://bytenota.com/javascript-redirect-to-another-web-page/
36
The extension is:

https://bytenota.com/apache-redirect-http-requests-to-https/
36
The extension is:

https://bytenota.com/php-delete-a-directory-recursively/
36
The extension is:

https://bytenota.com/apache-ant-writing-a-custom-ant-task/
36
The extension is:

https://bytenota.com/java-how-to-remove-duplicate-values-in-array/
36
The extension is:

https://bytenota.com/java-find-the-array-index-of-an-element/
36
The extension is:

https://bytenota.com/csharp-how-to-print-a-dictionary/
36
The extension is:

https://bytenota.com/javascript-how-to-get-child-elements-inside-a-div/
36
The extension is:

https://bytenota.com/jetty-how-to-solve-java-net-bindexception-address-already-in-use-bind/
36
The extension is:

https://bytenota.com/asp-net-http-handlers-not-working-on-azure/
36
The extension is:

https://bytenota.com/how-to-checkout-svn-in-php/
36
The extension is:

https://bytenota.com/java-create-a-custom-cors-filter/
36
The extension is:

https://bytenota.com/git-clone-a-specific-branch/
36
The extension is:

https://bytenota.com/java-register-a-servlet-in-spring-boot/
36
The extension is:

https://bytenota.com/java-set-a-context-param-in-spring-boot/
36
The extension is:

https://bytenota.com/how-to-execute-command-in-php/
36
The extension is:

https://bytenota.com/list-all-file-in-directory-folder-in-php/
36
The extension is:

https://bytenota.com/how-to-solve-starting-of-tomcat-failed-the-server-port-8080-is-already-in-use/
36
The extension is:

https://bytenota.com/csharp-replace-last-occurrence-of-a-string/
36
The extension is:

https://bytenota.com/how-to-block-google-analytics/
36
The extension is:

https://bytenota.com/how-to-enable-iis-manager-in-windows-10/
36
The extension is:

https://bytenota.com/how-to-solve-appcmd-is-not-recognized-as-the-name-of-a-cmdlet/
36
The extension is:

https://bytenota.com/git-create-a-new-empty-branch/
36
The extension is:

https://bytenota.com/how-to-run-a-sh-file-using-php/
36
The extension is:

https://bytenota.com/php-check-if-a-web-server-is-running-on-windows-or-linux/
36
The extension is:

https://bytenota.com/git-how-to-clone-a-specific-directory-from-a-git-repository/
36
The extension is:

https://bytenota.com/javascript-how-to-use-getelementsbyclassname-in-ie8-or-below/
36
The extension is:

https://bytenota.com/javascript-get-style-attribute-of-html-elements-as-string/
36
The extension is:

https://bytenota.com/javascript-check-if-a-browser-supports-html5-canvas-or-not/
36
The extension is:

https://bytenota.com/javascript-convert-image-to-base64-string/
36
The extension is:

https://bytenota.com/how-to-get-url-parameters-using-javascript/
36
The extension is:

https://bytenota.com/java-parse-float-from-string-containing-letters/
36
The extension is:

https://bytenota.com/java-parse-integer-from-string-containing-letters/
36
The extension is:

https://bytenota.com/get-ie-browser-version-using-java/
36
The extension is:

https://bytenota.com/get-ie-browser-version-using-javascript/
36
The extension is:

https://bytenota.com/how-to-disable-or-enable-button-using-jquery/
36
The extension is:

https://bytenota.com/jquery-detect-if-an-element-is-scrolled-till-the-end/
36
The extension is:

https://bytenota.com/get-ie-browser-version-using-php/
36
The extension is:

https://bytenota.com/jquery-check-if-an-element-is-visible-or-not/
36
The extension is:

https://bytenota.com/how-to-uninstall-global-npm-package/
36
The extension is:

https://bytenota.com/php-generate-a-random-string-with-a-z-and-0-9/
36
The extension is:

https://bytenota.com/php-replace-last-occurrence-of-a-string/
36
The extension is:

https://bytenota.com/javascript-replace-last-occurrence-of-a-string/
36
The extension is:

https://bytenota.com/java-replace-last-occurrence-of-a-string/
36
The extension is:

https://bytenota.com/how-to-get-an-hour-ago-in-java/
36
The extension is:

https://bytenota.com/how-to-get-an-hour-ago-in-php/
36
The extension is:

https://bytenota.com/how-to-get-current-unix-timestamp-in-php/
36
The extension is:

https://bytenota.com/how-to-get-current-unix-timestamp-in-java/
36
The extension is:

https://bytenota.com/how-to-convert-an-object-to-an-array-in-php/
36
The extension is:

https://bytenota.com/how-to-get-first-key-of-an-array-in-php/
36
The extension is:

https://bytenota.com/how-to-solve-proguard-errors-cant-find-referenced-class-javax-crypto-cipher/
36
The extension is:

https://bytenota.com/remove-index-php-from-codeigniter-url-redirect/
36
The extension is:

https://bytenota.com/sort-an-array-of-strings-alphabetically-in-php/
36
The extension is:

https://bytenota.com/check-if-url-contains-query-string-with-php/
36
The extension is:

https://bytenota.com/javascript-how-to-save-an-object-in-cookie/
36
The extension is:

https://bytenota.com/php-display-a-base64-image-from-database-in-html/
36
The extension is:

https://bytenota.com/how-to-convert-a-stacktrace-to-a-string-in-java/
36
The extension is:

https://bytenota.com/convert-a-string-to-a-number-in-javascript/
36
The extension is:

https://bytenota.com/convert-a-number-to-a-string-in-javascript/
36
The extension is:

https://bytenota.com/how-to-use-google-closure-compiler-to-compress-javascript-code/
36
The extension is:

https://bytenota.com/how-to-get-context-param-value-in-web-xml/
36
The extension is:

https://bytenota.com/how-to-use-http-authentication-with-php/
36
The extension is:

https://bytenota.com/how-to-get-the-current-date-time-of-your-time-zone-in-java/
36
The extension is:

https://bytenota.com/get-list-of-all-timezone-ids-in-java/
36
The extension is:

https://bytenota.com/how-to-install-a-jar-file-in-the-maven-local-repository/
36
The extension is:

https://bytenota.com/how-to-check-the-java-compiler-version-from-a-class-file/
36
The extension is:

https://bytenota.com/angular-creating-a-hello-world-application/angular-add-routing/#main
36
The extension is:

https://bytenota.com/angular-creating-a-hello-world-application/angular-cmd-launch-app/#main
36
The extension is:

https://bytenota.com/sourcetree-refresh-remotes-branches-status/sourcetree-refresh-remotes/#main
36
The extension is:

https://bytenota.com/sourcetree-refresh-remotes-branches-status/sourcetree-remotes/#main
36
The extension is:

https://bytenota.com/solved-invalidoperationexception-session-has-not-been-configured-for-this-application-or-request/incorrect-usesession-netcore/#main
36
The extension is:

https://bytenota.com/how-to-access-localhost-asp-net-webapp-from-mobile-tablet-device/iismanager-binding/#main
36
The extension is:

https://bytenota.com/how-to-access-localhost-asp-net-webapp-from-mobile-tablet-device/addnew-inbound-rules-name/#main
36
The extension is:

https://bytenota.com/how-to-access-localhost-asp-net-webapp-from-mobile-tablet-device/addnew-inbound-rules-profile/#main
36
The extension is:

https://bytenota.com/how-to-access-localhost-asp-net-webapp-from-mobile-tablet-device/addnew-inbound-rules-allowconnection/#main
36
The extension is:

https://bytenota.com/how-to-access-localhost-asp-net-webapp-from-mobile-tablet-device/addnew-inbound-rules-portset/#main
36
The extension is:

https://bytenota.com/how-to-access-localhost-asp-net-webapp-from-mobile-tablet-device/addnew-inbound-rules-portoption/#main
36
The extension is:

https://bytenota.com/how-to-access-localhost-asp-net-webapp-from-mobile-tablet-device/addnew-inbound-rules/#main
36
The extension is:

https://bytenota.com/how-to-access-localhost-asp-net-webapp-from-mobile-tablet-device/lan-ipaddress/#main
36
The extension is:

https://bytenota.com/how-to-access-localhost-asp-net-webapp-from-mobile-tablet-device/turn-off-windows-firewall/#main
36
The extension is:

https://bytenota.com/installing-and-starting-jetty-application-server/
36
The extension is:

https://bytenota.com/visual-studio-solved-using-directive-is-unnecessary/
36
The extension is:

https://bytenota.com/python-creating-a-hello-world-program/python-icon/#main
36
The extension is:

https://bytenota.com/html-css-icon/
36
The extension is:

https://bytenota.com/javascript-create-multiline-strings-in-es5-es6/
36
The extension is:

https://bytenota.com/expression-language-in-jsp-custom-tag-attributes/
36
The extension is:

https://bytenota.com/csharp-write-text-to-debug-output-window-of-visual-studio/
36
The extension is:

https://bytenota.com/typescript-programming/
36
The extension is:

https://bytenota.com/terms-of-use/
36
The extension is:

https://bytenota.com/privacy-policy/
36
The extension is:

https://bytenota.com/how-to-use-http-authentication-with-php/php-programming/#main
36
The extension is:

https://bytenota.com/javascript-how-to-save-an-object-in-cookie/javascript-programming-v1/#main
36
The extension is:

https://bytenota.com/java-programming-v1/
36
The extension is:

https://bytenota.com/devops-icon/
36
The extension is:

https://bytenota.com/csharp-replace-last-occurrence-of-a-string/csharp-icon/#main
36
The extension is:

https://bytenota.com/tips-red/
36
The extension is:

https://bytenota.com/php-how-to-delete-a-file/
36
The extension is:

https://bytenota.com/php-redirect-to-another-web-page/
36
The extension is:

https://bytenota.com/visual-studio-shortcut-key-to-duplicate-a-line/
36
The extension is:

https://bytenota.com/svn-list-all-svnexternals-in-a-directory-structure/
36
The extension is:

https://bytenota.com/about/
36
The extension is:

https://bytenota.com/tag/codeigniter/
36
The extension is:

https://bytenota.com/tag/php/
36
The extension is:

https://bytenota.com/tag/codeigniter4/
36
The extension is:

https://bytenota.com/tag/ci/
36
The extension is:

https://bytenota.com/tag/ci4/
36
The extension is:

https://bytenota.com/tag/composer/
36
The extension is:

https://bytenota.com/tag/hello-world/
36
The extension is:

https://bytenota.com/tag/apache-ant/
36
The extension is:

https://bytenota.com/tag/ruby/
36
The extension is:

https://bytenota.com/tag/typescript/
36
The extension is:

https://bytenota.com/tag/date_diff/
36
The extension is:

https://bytenota.com/tag/java/
36
The extension is:

https://bytenota.com/tag/helloworld-program/
36
The extension is:

https://bytenota.com/tag/angular/
36
The extension is:

https://bytenota.com/tag/npm/
36
The extension is:

https://bytenota.com/tag/angular-cli/
36
The extension is:

https://bytenota.com/tag/localhost/
36
The extension is:

https://bytenota.com/tag/csharp/
36
The extension is:

https://bytenota.com/tag/netcore/
36
The extension is:

https://bytenota.com/tag/fips140/
36
The extension is:

https://bytenota.com/tag/fips/
36
The extension is:

https://bytenota.com/tag/win10/
36
The extension is:

https://bytenota.com/tag/windows/
36
The extension is:

https://bytenota.com/tag/javascript/
36
The extension is:

https://bytenota.com/tag/nodejs/
36
The extension is:

https://bytenota.com/tag/es6/
36
The extension is:

https://bytenota.com/tag/vscode/
36
The extension is:

https://bytenota.com/tag/tips/
36
The extension is:

https://bytenota.com/tag/sourcetree/
36
The extension is:

https://bytenota.com/tag/virtualbox/
36
The extension is:

https://bytenota.com/tag/parse-useragent/
36
The extension is:

https://bytenota.com/tag/microsoft-edge/
36
The extension is:

https://bytenota.com/tag/asp-net-core/
36
The extension is:

https://bytenota.com/tag/csharp-reflection/
36
The extension is:

https://bytenota.com/tag/ascii/
36
The extension is:

https://bytenota.com/tag/mb_detect_encoding/
36
The extension is:

https://bytenota.com/tag/git-mv/
36
The extension is:

https://bytenota.com/tag/git-command/
36
The extension is:

https://bytenota.com/tag/symfony/
36
The extension is:

https://bytenota.com/tag/symfony4/
36
The extension is:

https://bytenota.com/tag/fosuserbundle/
36
The extension is:

https://bytenota.com/tag/jdk_home/
36
The extension is:

https://bytenota.com/tag/netbeans/
36
The extension is:

https://bytenota.com/tag/git/
36
The extension is:

https://bytenota.com/tag/list-all-revisions/
36
The extension is:

https://bytenota.com/tag/list-older-revisions/
36
The extension is:

https://bytenota.com/tag/list-previous-revisions/
36
The extension is:

https://bytenota.com/tag/show-all-svn-logs/
36
The extension is:

https://bytenota.com/tag/show-previous-svn-logs/
36
The extension is:

https://bytenota.com/tag/svn/
36
The extension is:

https://bytenota.com/tag/subversion/
36
The extension is:

https://bytenota.com/tag/node-js/
36
The extension is:

https://bytenota.com/tag/command/
36
The extension is:

https://bytenota.com/tag/extract-query-string/
36
The extension is:

https://bytenota.com/tag/url-parameter/
36
The extension is:

https://bytenota.com/tag/querystring/
36
The extension is:

https://bytenota.com/tag/sqlite3-on-macosx/
36
The extension is:

https://bytenota.com/tag/linux-command/
36
The extension is:

https://bytenota.com/tag/sqlite3-on-linux/
36
The extension is:

https://bytenota.com/tag/install-sqlite3/
36
The extension is:

https://bytenota.com/tag/sqlite3-on-windows/
36
The extension is:

https://bytenota.com/tag/enable-sqlite3/
36
The extension is:

https://bytenota.com/tag/java-reflection/
36
The extension is:

https://bytenota.com/tag/reflection/
36
The extension is:

https://bytenota.com/tag/invoke-method/
36
The extension is:

https://bytenota.com/tag/linux/
36
The extension is:

https://bytenota.com/tag/set-java_home/
36
The extension is:

https://bytenota.com/tag/command-prompt/
36
The extension is:

https://bytenota.com/tag/cmd/
36
The extension is:

https://bytenota.com/tag/detect-querystring/
36
The extension is:

https://bytenota.com/tag/serlvet/
36
The extension is:

https://bytenota.com/tag/httpservletrequest/
36
The extension is:

https://bytenota.com/tag/get-parameter-from-url/
36
The extension is:

https://bytenota.com/tag/getparameter/
36
The extension is:

https://bytenota.com/tag/prameters/
36
The extension is:

https://bytenota.com/tag/unserialize/
36
The extension is:

https://bytenota.com/tag/serialize/
36
The extension is:

https://bytenota.com/tag/serialize-object/
36
The extension is:

https://bytenota.com/tag/convert-json-to-object/
36
The extension is:

https://bytenota.com/tag/convert-object-to-json/
36
The extension is:

https://bytenota.com/tag/json_decode/
36
The extension is:

https://bytenota.com/tag/json_encode/
36
The extension is:

https://bytenota.com/tag/tagsupport/
36
The extension is:

https://bytenota.com/tag/pagecontext/
36
The extension is:

https://bytenota.com/tag/custom-jsp-tag/
36
The extension is:

https://bytenota.com/tag/jsf/
36
The extension is:

https://bytenota.com/tag/jsf-component/
36
The extension is:

https://bytenota.com/tag/wordpress/
36
The extension is:

https://bytenota.com/tag/get-posts-by-category/
36
The extension is:

https://bytenota.com/tag/get_posts/
36
The extension is:

https://bytenota.com/tag/wordpress-post/
36
The extension is:

https://bytenota.com/tag/wordpress-category/
36
The extension is:

https://bytenota.com/tag/jquery/
36
The extension is:

https://bytenota.com/tag/base64/
36
The extension is:

https://bytenota.com/tag/base64-image/
36
The extension is:

https://bytenota.com/tag/html-iframe/
36
The extension is:

https://bytenota.com/tag/base64-string-to-file/
36
The extension is:

https://bytenota.com/tag/save-base64-image-to-file/
36
The extension is:

https://bytenota.com/tag/save-base64-image/
36
The extension is:

https://bytenota.com/tag/select-element-in-iframe/
36
The extension is:

https://bytenota.com/tag/base64-string/
36
The extension is:

https://bytenota.com/tag/display-base64-encoded-image/
36
The extension is:

https://bytenota.com/tag/display-base64-string/
36
The extension is:

https://bytenota.com/tag/display-base64-image/
36
The extension is:

https://bytenota.com/tag/bufferedimage/
36
The extension is:

https://bytenota.com/tag/copy-object/
36
The extension is:

https://bytenota.com/tag/clone-bufferedimage/
36
The extension is:

https://bytenota.com/tag/copy-bufferedimage/
36
The extension is:

https://bytenota.com/tag/clone-java-object/
36
The extension is:

https://bytenota.com/tag/enum-field/
36
The extension is:

https://bytenota.com/tag/enum-field-value/
36
The extension is:

https://bytenota.com/tag/java-enum/
36
The extension is:

https://bytenota.com/tag/replace-last/
36
The extension is:

https://bytenota.com/tag/python/
36
The extension is:

https://bytenota.com/tag/replace-last-string/
36
The extension is:

https://bytenota.com/tag/replace-string/
36
The extension is:

https://bytenota.com/tag/wp-add-shortcodes/
36
The extension is:

https://bytenota.com/tag/wordpress-plugin/
36
The extension is:

https://bytenota.com/tag/wordpress-theme/
36
The extension is:

https://bytenota.com/tag/do_shortcode/
36
The extension is:

https://bytenota.com/tag/jsp/
36
The extension is:

https://bytenota.com/tag/parameters/
36
The extension is:

https://bytenota.com/tag/disable-element/
36
The extension is:

https://bytenota.com/tag/disable-a-link/
36
The extension is:

https://bytenota.com/tag/gitlab/
36
The extension is:

https://bytenota.com/tag/build-angular/
36
The extension is:

https://bytenota.com/tag/user-select-property/
36
The extension is:

https://bytenota.com/tag/div-readonly/
36
The extension is:

https://bytenota.com/tag/local-timezone/
36
The extension is:

https://bytenota.com/tag/timezone/
36
The extension is:

https://bytenota.com/tag/datetime/
36
The extension is:

https://bytenota.com/tag/timezone-ids/
36
The extension is:

https://bytenota.com/tag/list-timezone/
36
The extension is:

https://bytenota.com/tag/extract-firefox-version/
36
The extension is:

https://bytenota.com/tag/get-firefox-version/
36
The extension is:

https://bytenota.com/tag/firefox-version/
36
The extension is:

https://bytenota.com/tag/detect-firefox/
36
The extension is:

https://bytenota.com/tag/firefox/
36
The extension is:

https://bytenota.com/tag/endswith/
36
The extension is:

https://bytenota.com/tag/startswith/
36
The extension is:

https://bytenota.com/tag/php-endswith/
36
The extension is:

https://bytenota.com/tag/php-startswidth/
36
The extension is:

https://bytenota.com/tag/status-code/
36
The extension is:

https://bytenota.com/tag/detect-status-code/
36
The extension is:

https://bytenota.com/tag/response-status-code/
36
The extension is:

https://bytenota.com/tag/http-response/
36
The extension is:

https://bytenota.com/tag/get_headers/
36
The extension is:

https://bytenota.com/tag/detect-response-code/
36
The extension is:

https://bytenota.com/tag/parse-url/
36
The extension is:

https://bytenota.com/tag/parse-querystring/
36
The extension is:

https://bytenota.com/tag/private-ipv6/
36
The extension is:

https://bytenota.com/tag/ipv6/
36
The extension is:

https://bytenota.com/tag/private-ipv4/
36
The extension is:

https://bytenota.com/tag/ipv4/
36
The extension is:

https://bytenota.com/tag/parse-xml/
36
The extension is:

https://bytenota.com/tag/xml-dom/
36
The extension is:

https://bytenota.com/tag/xml-parser/
36
The extension is:

https://bytenota.com/tag/js-sha256/
36
The extension is:

https://bytenota.com/tag/hash-string/
36
The extension is:

https://bytenota.com/tag/amd/
36
The extension is:

https://bytenota.com/tag/js-sha1/
36
The extension is:

https://bytenota.com/tag/detect-https/
36
The extension is:

https://bytenota.com/tag/detect-ssl/
36
The extension is:

https://bytenota.com/tag/parse-sitemap/
36
The extension is:

https://bytenota.com/tag/js/
36
The extension is:

https://bytenota.com/tag/sitemap/
36
The extension is:

https://bytenota.com/tag/xml/
36
The extension is:

https://bytenota.com/tag/click-event/
36
The extension is:

https://bytenota.com/tag/checkbox/
36
The extension is:

https://bytenota.com/tag/asp-net/
36
The extension is:

https://bytenota.com/tag/useragent/
36
The extension is:

https://bytenota.com/tag/detect-ie/
36
The extension is:

https://bytenota.com/tag/ie-browser-version/
36
The extension is:

https://bytenota.com/tag/ie/
36
The extension is:

https://bytenota.com/tag/ie-version/
36
The extension is:

https://bytenota.com/tag/save-object/
36
The extension is:

https://bytenota.com/tag/javascript-object/
36
The extension is:

https://bytenota.com/tag/sessionstorage/
36
The extension is:

https://bytenota.com/tag/html5-sessionstorage/
36
The extension is:

https://bytenota.com/tag/localstorage/
36
The extension is:

https://bytenota.com/tag/html5-localstorage/
36
The extension is:

https://bytenota.com/tag/html5/
36
The extension is:

https://bytenota.com/tag/web-storage/
36
The extension is:

https://bytenota.com/tag/write-log/
36
The extension is:

https://bytenota.com/tag/error_log/
36
The extension is:

https://bytenota.com/tag/rxjs/
36
The extension is:

https://bytenota.com/tag/featured-image-alt/
36
The extension is:

https://bytenota.com/tag/thumbnail-alt/
36
The extension is:

https://bytenota.com/tag/thumbnail-caption/
36
The extension is:

https://bytenota.com/tag/featured-image-caption/
36
The extension is:

https://bytenota.com/tag/get-selected-checkboxes/
36
The extension is:

https://bytenota.com/tag/private-ip/
36
The extension is:

https://bytenota.com/tag/inet4address/
36
The extension is:

https://bytenota.com/tag/ip-address/
36
The extension is:

https://bytenota.com/tag/inet6address/
36
The extension is:

https://bytenota.com/tag/delete-property/
36
The extension is:

https://bytenota.com/tag/redirect/
36
The extension is:

https://bytenota.com/tag/location-href/
36
The extension is:

https://bytenota.com/tag/location-replace/
36
The extension is:

https://bytenota.com/tag/jquery-redirect/
36
The extension is:

https://bytenota.com/tag/htaccess/
36
The extension is:

https://bytenota.com/tag/redirect-https/
36
The extension is:

https://bytenota.com/tag/http-to-https/
36
The extension is:

https://bytenota.com/tag/https-requests/
36
The extension is:

https://bytenota.com/tag/remove-folder/
36
The extension is:

https://bytenota.com/tag/delete-file/
36
The extension is:

https://bytenota.com/tag/delete-directory/
36
The extension is:

https://bytenota.com/tag/delete-folder/
36
The extension is:

https://bytenota.com/tag/remove-directory/
36
The extension is:

https://bytenota.com/tag/custom-ant-task/
36
The extension is:

https://bytenota.com/tag/ant-task/
36
The extension is:

https://bytenota.com/tag/array/
36
The extension is:

https://bytenota.com/tag/remove-duplicates/
36
The extension is:

https://bytenota.com/tag/hashset/
36
The extension is:

https://bytenota.com/tag/integer-array/
36
The extension is:

https://bytenota.com/tag/indexof/
36
The extension is:

https://bytenota.com/tag/aslist/
36
The extension is:

https://bytenota.com/tag/find-array-index/
36
The extension is:

https://bytenota.com/tag/dictionary/
36
The extension is:

https://bytenota.com/tag/c/
36
The extension is:

https://bytenota.com/tag/get-child-elements/
36
The extension is:

https://bytenota.com/tag/child-elements/
36
The extension is:

https://bytenota.com/tag/jetty-port/
36
The extension is:

https://bytenota.com/tag/jetty/
36
The extension is:

https://bytenota.com/tag/jetty-web-server/
36
The extension is:

https://bytenota.com/tag/jetty-server/
36
The extension is:

https://bytenota.com/tag/web-server/
36
The extension is:

https://bytenota.com/tag/java-web-server/
36
The extension is:

https://bytenota.com/tag/azure/
36
The extension is:

https://bytenota.com/tag/azure-service/
36
The extension is:

https://bytenota.com/tag/azure-cloud/
36
The extension is:

https://bytenota.com/tag/web-config/
36
The extension is:

https://bytenota.com/tag/svn-checkout/
36
The extension is:

https://bytenota.com/tag/exec/
36
The extension is:

https://bytenota.com/tag/svn-export/
36
The extension is:

https://bytenota.com/tag/cors-filter/
36
The extension is:

https://bytenota.com/tag/serlvet-cors-filter/
36
The extension is:

https://bytenota.com/tag/corsfilter/
36
The extension is:

https://bytenota.com/tag/cors/
36
The extension is:

https://bytenota.com/tag/servlet/
36
The extension is:

https://bytenota.com/tag/remote/
36
The extension is:

https://bytenota.com/tag/clone/
36
The extension is:

https://bytenota.com/tag/mkdir/
36
The extension is:

https://bytenota.com/tag/clone-branch/
36
The extension is:

https://bytenota.com/tag/checkout/
36
The extension is:

https://bytenota.com/tag/servletregistrationbean/
36
The extension is:

https://bytenota.com/tag/servlet-mapping/
36
The extension is:

https://bytenota.com/tag/context-param/
36
The extension is:

https://bytenota.com/tag/spring-boot/
36
The extension is:

https://bytenota.com/tag/springboot/
36
The extension is:

https://bytenota.com/tag/shell_execute/
36
The extension is:

https://bytenota.com/tag/scandir/
36
The extension is:

https://bytenota.com/tag/array_diff/
36
The extension is:

https://bytenota.com/tag/netstat/
36
The extension is:

https://bytenota.com/tag/tomcat-failed/
36
The extension is:

https://bytenota.com/tag/taskkill/
36
The extension is:

https://bytenota.com/tag/tomcat/
36
The extension is:

https://bytenota.com/tag/kill/
36
The extension is:

https://bytenota.com/tag/file-hosts/
36
The extension is:

https://bytenota.com/tag/unix/
36
The extension is:

https://bytenota.com/tag/hosts/
36
The extension is:

https://bytenota.com/tag/iis/
36
The extension is:

https://bytenota.com/tag/iss-manager/
36
The extension is:

https://bytenota.com/tag/appcmd/
36
The extension is:

https://bytenota.com/tag/orphan/
36
The extension is:

https://bytenota.com/tag/git-commit/
36
The extension is:

https://bytenota.com/tag/git-checkout/
36
The extension is:

https://bytenota.com/tag/orphan-branch/
36
The extension is:

https://bytenota.com/tag/git-branch/
36
The extension is:

https://bytenota.com/tag/bash-script/
36
The extension is:

https://bytenota.com/tag/php_os/
36
The extension is:

https://bytenota.com/tag/webserver/
36
The extension is:

https://bytenota.com/tag/devops/
36
The extension is:

https://bytenota.com/tag/ie8/
36
The extension is:

https://bytenota.com/tag/ie8-or-below/
36
The extension is:

https://bytenota.com/tag/getelementsbyclassname/
36
The extension is:

https://bytenota.com/tag/ie-compatibility-view/
36
The extension is:

https://bytenota.com/tag/getelementsbytagname/
36
The extension is:

https://bytenota.com/tag/html/
36
The extension is:

https://bytenota.com/tag/style-attribute/
36
The extension is:

https://bytenota.com/tag/canvas/
36
The extension is:

https://bytenota.com/tag/filereader/
36
The extension is:

https://bytenota.com/tag/url/
36
The extension is:

https://bytenota.com/tag/parsefloat/
36
The extension is:

https://bytenota.com/tag/parseint/
36
The extension is:

https://bytenota.com/tag/detect/
36
The extension is:

https://bytenota.com/tag/scroll/
36
The extension is:

https://bytenota.com/tag/random-string/
36
The extension is:

https://bytenota.com/tag/timestamp/
36
The extension is:

https://bytenota.com/tag/unix-timestamp/
36
The extension is:

https://bytenota.com/tag/typecasting/
36
The extension is:

https://bytenota.com/tag/first-key-array/
36
The extension is:

https://bytenota.com/tag/get-first-key/
36
The extension is:

https://bytenota.com/tag/key/
36
The extension is:

https://bytenota.com/tag/reset/
36
The extension is:

https://bytenota.com/tag/proguard/
36
The extension is:

https://bytenota.com/tag/sort/
36
The extension is:

https://bytenota.com/tag/cookie/
36
The extension is:

https://bytenota.com/tag/json/
36
The extension is:

https://bytenota.com/tag/closure-compiler/
36
The extension is:

https://bytenota.com/tag/http-auth/
36
The extension is:

https://bytenota.com/tag/maven/
36
The extension is:

https://bytenota.com/tag/java-compiled/
36
The extension is:

https://bytenota.com/tag/compiler/
36
The extension is:

https://bytenota.com/tag/java-version/
36
The extension is:

https://bytenota.com/tag/webapps/
36
The extension is:

https://bytenota.com/tag/http-client/
36
The extension is:

https://bytenota.com/tag/http-server/
36
The extension is:

https://bytenota.com/tag/servlet-container/
36
The extension is:

https://bytenota.com/tag/visual-studio/
36
The extension is:

https://bytenota.com/tag/es5/
36
The extension is:

https://bytenota.com/tag/multiline-strings/
36
The extension is:

https://bytenota.com/tag/expression-language/
36
The extension is:

https://bytenota.com/tag/writeline/
36
The extension is:

https://bytenota.com/tag/output-text/
36
The extension is:

https://bytenota.com/tag/debug/
36
The extension is:

https://bytenota.com/tag/php-delete-file/
36
The extension is:

https://bytenota.com/tag/unlink/
36
The extension is:

https://bytenota.com/tag/php-redirect/
36
The extension is:

https://bytenota.com/tag/duplicate-code/
36
The extension is:

https://bytenota.com/tag/duplicate-line/
36
The extension is:

https://bytenota.com/tag/svn-externals/
36
The extension is:

C:\wamp64\www\Work\buzz\Templates\crawler_Test.php:66:
array (size=0)
empty

C:\wamp64\www\Work\buzz\Templates\crawler_Test.php:67:
array (size=0)
empty

C:\wamp64\www\Work\buzz\Templates\crawler_Test.php:68:
array (size=0)
empty

C:\wamp64\www\Work\buzz\Templates\crawler_Test.php:69:
array (size=0)
empty

C:\wamp64\www\Work\buzz\Templates\crawler_Test.php:70:
array (size=0)
empty

77
84
91
98
105**

It seems to me pritaeas that it is working. But you are the man with the knowledge.
You reckon my code is perfect now ? No flaws ?

@dani

You reckon I should shorten my above code any further ? I tried my best. What is your feed-back on it ? Shall I stick to it ?

@dani

On this code I got it like this:

$sitemap = 'https://bytenota.com/sitemap.xml';
$content = file_get_contents($sitemap);
$xml = simplexml_load_string($content);

Now, based on what you taught me here:
https://www.daniweb.com/programming/web-development/threads/540168/what-to-lookout-for-to-prevent-crawler-traps

Should I switch the above 3 lines to following to shorten it up ?

$sitemap = 'https://bytenota.com/sitemap.xml';
$xml = simplexml_load_string(file_get_contents($sitemap));

Aslong as there won't be any issue for shortening it to this, then I guess best to shorten it. What you say ?
remember, I started this thread, before the other one mentioned above. So, best I complete this one too. It's a file extention extractor that will be turned into a 1st crawler.

While the other thread is about pure web crawler. 2nd crawler.

So two crawlers now. Coed 2 different ways. That's how I gain work experience. Code in many different ways and then test and see which code fairs better in high website traffic. ;)

@dani

You know what. This

gives me this error, if the initial crawling url is not an .xml file as I gave the initial url https:/www.google.com

( ! ) Warning: simplexml_load_string(): Entity: line 1: parser error : StartTag: invalid element name in C:\wamp64\www\Work\buzz\Templates\crawler_Test.php on line 26

So, shortening it to following is a big mistake!

$sitemap = 'https://google.com';
$xml = simplexml_load_string(file_get_contents($sitemap));

I think the best thing to do is, use following if initial crawling url is .xml file:

$xml = simplexml_load_string($sitemap);

I think the best thing to do is, use following if initial crawling url is .html, .htm, .php, etc. (not .xml file) file:

$html = file_get_contents($sitemap);

Good idea, Dani ?

This doesn't work:

$sitemap = 'https://google.com';
$xml = simplexml_load_string(file_get_contents($sitemap));

because 'https://google.com' is not a sitemap file.

You can never just use simplexml_load_string() without also using file_get_contents() or cURL or some other way of fetching the contents of the sitemap file.

@dani

Mmm.

Q1. So, you are saying this is 100%+ wrong:

$initial_url = "https://www.rocktherankings.com/sitemap_index.xml"; //Has more xml files.
$xmls = file_get_contents($initial_url);

//Parse the sitemap content to object
$xml = simplexml_load_string($xmls);

And it has to be:

$initial_url = "https://www.rocktherankings.com/sitemap_index.xml"; //Has more xml files.
$xml = simplexml_load_string(file_get_contents($initial_url));

Are you 100%+ positive ?
Ok sorted.

Q2. So, if the initial url is .xml file, then I use this:

$sitemap = "https://www.rocktherankings.com/sitemap_index.xml"; //Has more xml files.
$xml = simplexml_load_string(file_get_contents($sitemap)); 

But if it is a file other than .xml file, then I use this:

$url = "https://www.google.com"; //Not .xml file
$html = file_get_contents($url); 

Yes ? Can you confirm ?

Q3.
Now, following are not really errors are they ? Just saying arrays empty. Yes ?

**C:\wamp64\www\Work\buzz\Templates\crawler_Test.php:72:
array (size=0)
empty

C:\wamp64\www\Work\buzz\Templates\crawler_Test.php:73:
array (size=0)
empty

C:\wamp64\www\Work\buzz\Templates\crawler_Test.php:74:
array (size=0)
empty

C:\wamp64\www\Work\buzz\Templates\crawler_Test.php:75:
array (size=0)
empty

C:\wamp64\www\Work\buzz\Templates\crawler_Test.php:76:
array (size=0)
empty

83
90
97
104
111**

Code used ...

$initial_url = "https://www.rocktherankings.com/sitemap_index.xml"; //Has more xml files.
$xml = simplexml_load_string(file_get_contents($initial_url));

$dom = new DOMDocument();
$dom->loadXML($xml);

@dani

Once you have answered my above post. If you see the answers to my 3 questions are "YES", then do close this thread as issue solved.
Ditching my original code.

Learnt a lot from you and pritaeas.
Thanks to you both!

@pritaeas

Learnt a lot from you and dani.
Thanks to you both for your patience on my frequent taggings requesting help!

Thanks for helping with this!

@dani

Can you confirm then close ?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.