Hi, I'm writing a web crawling program for my personal site, and I'm looking at using regex to extract the URLs. However, I have both absolute and relative URLs, and I want to match URLs only on my site (mysite.com).
So it would match:
/index.php image1.jpg page1.html Http://mysite.com/ Http://mysite.com/page1.html Http://Wiki.mysite.com/ Wiki.mysite.com/
but it wouldn't match:
Bob Www.google.com Mailto:Admin@mysite.com
Can anyone give me assistance? I'd post what I have so far, but it is this: