Im a completely retarded person when it comes to creating regex patterns!

I want to check a users input, and allow certain urls, so here I have half of a pattern I have put together - Well it is actually the whole bit, but im missing bits and parts to actually call it a regex.
So with errors:

if (!preg_match('|^http://|https://([www])+[.]+(-azAZ09)+[.]+[azAZ]{2,6}?$|i', $post_gl_url))

I want to allow:
http:// and https://
www ({3?})
. a dot
domain name
. a dot
com/us/dk/pl etcetc

And then it should be possible after this to add a querystring, so this should be allowed too, but the pattern shouldnt create an error if theres no querystring appended:

?id=10&-+_ (allowed in query-string)

How in he.. do I put it together?

So I end up with these being valid urls, ieg:

Lost in the regex world!

All the best,

Recommended Answers

All 6 Replies


Yeah this can be really horrible at times!!!!

This is a snippet, I would recommend changing this into a function of some sort, just returning true or false.

$regex = "((https?|ftp)\:\/\/)?"; // SCHEME 
    $regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass 
    $regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP 
    $regex .= "(\:[0-9]{2,5})?"; // Port 
    $regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path 
    $regex .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?"; // GET Query 
    $regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor 

    $url = '';
       if(preg_match("/^$regex$/", $url)) 
               print 'true'; 
commented: Provided a great snippet for validating URLS! +3

Thanks for the input!

I have put the pile of weird charachters into my validation scripts, and it seems to do what i am looking for!

I have included all the appended $regex lines..

Is this an overkill? As my ability to break down the entire pattern is somewhat small, I am thinking: DO I need alle the lines to check for a URL?

These to:

$regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass 
  $regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP 
  $regex .= "(\:[0-9]{2,5})?"; // Port

I would keep all the of the reg in there, I lifted this from my applications, I keep this as a common function, and I find it is good practise, as you might want to support other url's later on.

The reason it is on sperate lines is because its much easier to read, it could all be on one line, all it is doing adding them together to make one large expression. Each line is split into separate sections of the expression.

Also I find you can't overkill validation. But that is my opinion.

Hope this helps :)

So it validates many other parameters than my question in the first place, i see that..

I agree with you on the validation part, as long it works i am happy! I can allways find many reasons for not learning proper regex, and allways end up having problemes when i need one..

But again some are straight forward, and some as complex as the author can manage..

What kind of function does these 3 lines have when it comes to validating urls, like where in a url would it check, and what would be "not good"?:

$regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass 
  $regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP 
  $regex .= "(\:[0-9]{2,5})?"; // Port

Just to understand better why I am using it..

Thanks a lot by the way!!

Ok the port section check if there is a port variable, so for or something, but checks that if there is a port, then it must be a valid number. The user and password is if your using apache auth through the URL, this just checks that the username and password are in the correct format. While the HOST IP checks that if the hostname is of a valid type or if just an IP is given this follows IP convention.

Thanks a lot, you saved the night!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.