I recently changed scripts on a price comparison site, and the structure is completely different.
Under the old script there would be a page with a url like this
But under the new script the url would be
A lot of pages are index in google, and I set up an error page to email me when someone tries to load a page from the old script. What I found was googlebot tries to load old pages every day, even though I submitted a new site map weeks ago. Also visitors try to load old pages every day.
What I want the error page to do is
1. Get the page the visitor was looking for
2. Extract the "the-product" bit
3. Replace the - with spaces
4. Create the correct url http://www.my-site.com/index.php?search=the product
5. Redirect the visitor to that page.
I can do everything except step 2, because the old url varies, there can be one number folder in the url, or anything up to 4. So the url can be any one of the ones below.
http://www.my-site.com/product/1/the-product.html http://www.my-site.com/product/1/22/the-product.html http://www.my-site.com/product/1/22/333/the-product.html http://www.my-site.com/product/1/22/333/444/the-product.html
So the question is, how do I extract everything after the last slash and before .html?