I'm writing a php script that takes a URL and allows the user to highlight parts of the page. When I type in a URL all of the images are broken. How can I write a script to change relative links and assets to absoulute.
Another thing this would need to cope with is with URLs such as http://example.com/example.html because some scripts might rewrite this as http://example.com/example.html/img/img.png

Thanks for any help

Recommended Answers

All 2 Replies

So where are you getting these links? Are they the link of the page the user is currently on? Or are you retrieving them from a database or something like that?

For the rest, you could use regular expressions to find the base part of the link. I think something like this should work:

/^https?:\/\/[a-z0-9-.]+/

E.g.:

<?php
$regex = '/^https?:\/\/[a-z0-9\-.]+/';
preg_match($regex, $full_website_url, $matches);
$base_url = $matches[0];

That should return the base part of the website url, for example the "http://www.daniweb.com" part out of "http://www.daniweb.com/web-development/php/threads/451964/change-all-urls-to-absolute".

You need to do output buffering. Parse the markup for @href and @src attributes and start normalizing them. For normalizing urls... it's quite easy. Break by separators '/' into an array (keep note if it ends in /), exclude single dots, exclude two dots and preceeding element (if any) in the array. And at the end join it all together tailing it with a / if it initially had a / at the end.

I have this whole thing written recently in C++ :) I'll publish it here as a PHP tutorial with both output buffering and url resolving. It's currently integrated in my framework, need to rip it out and revisit the code a bit :)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.