Hi,

I used to know how to do this but forgotten.
How can I check the extension of a url ?
You see, sometimes sitemap xml files do not list site urls (.html) for my crawler to extract all the urls.
Instead, they list further xml files and so my crawler has to go one level deep to extract all the site urls (.html).
And so, I now need to teach the crawler to detect the file type once it extracts urls from a page.

Let's say, my crawler extracted this url/link on a page:
https://www.rocktherankings.com/sitemap_index.xml

Now, how to detect the file type or file extension of that url ?
Which php function to use ? I see parse_url() won't do the job.
I really do not want to be using explode() function here and doing things the long way. I remember php has a specific file type function to do things the shorter way to detect the file type.

Ok. Any better way than these 2 ?

1
https://www.php.net/manual/en/function.pathinfo.php

$path_parts = pathinfo('/www/htdocs/inc/lib.inc.php');
echo 'The extension is: ' .$path_parts['extension']; echo '<br>';

2
https://stackoverflow.com/questions/173868/how-to-get-a-files-extension-in-php

$path = '/www/htdocs/inc/lib.inc.php';
$ext = pathinfo($path, PATHINFO_EXTENSION);
echo 'The extension is: ' .$ext;

3
https://www.delftstack.com/howto/php/how-to-get-a-file-extension-in-php/#:~:text=and%20getExtension%20function-,Use%20pathinfo()%20Function%20to%20Get%20File%20Extension%20in%20PHP,this%20function%20is%20as%20follows.&text=It%20is%20the%20string%20containing%20the%20path%20with%20file%20name%20and%20extension.

$path = "E:\work\CM\myppt.ppt";
$file = new SplFileInfo($path);
$extension  = $file->getExtension();
echo("The extension is: $extension."); 

And which one you prefer out of the above and why you prefer it over the others ?

commented: Thanks +0

Hiya,

The 2nd sample above. Seems simpler. So sticking to it.
Any feed-back ?

commented: Good choice! +1
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.