0

Hi, i was searching for a correct url regex, but unsure how would i do it. I am not very expert in regex. So can't be sure that this will always work. I wanted to be able to match even inner urls, such as:
http://google.com
http://www.google.com
http://google.com/something?some=unsome&w=anything

i only wanted to allow http(s) but not gopher or news protocol. Can someone help, please?

$allowed_chars = '[a-zA-Z0-9+._-]';
$protocol = '(http|https)://'
$w3 = "(www.)?";
$subdomain = "{$allowed_chars}*\.?";
$domain = "{$allowed_chars}*\.";
$domain_format "{$allowed_chars}{2,3}";
$country_code = "{$allowed_chars}{2,3}";
$end_parts = "({$allowed_chars}/&:\?)*";
2
Contributors
2
Replies
4
Views
8 Years
Discussion Span
Last Post by Mohammed S
1

I came up against this same issue myself recently, and finally settled on using cURL to determine whether a URL was valid, or not.

If you open a cURL connection to the URL, and check for the correct http headers, you can be sure it exists. Likewise, if you receive something like a 404, you know the URL doesn't exist.

A good example can be found here. I however would be inclined to accept 301, 302, etc as valid URLs too.

Cheers,
R

Votes + Comments
Good idea.
This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.