hey, so for my first project in my class, I need to build a crawler
this is what I need it to do:

- crawl and follow all urls on a domain (not go out into other websites)
- get the title of the page
- get the meta tag keywords of the page
- get the meta tag description of the page
- get the url of the page
- store all of this information in a MySQL database
- then follow the url's on the page and do the same thing on those pages

- if possible, I would also like to get the full index of the page and store it into the MySQL database as well (we are going to make a script for our next project to search for keywords of what we crawled, but I can do that myself after I have the data).

I'm not planning to create a public website or anything, just something for private testing. if you cannot help with this but you know of other scripts with similar features to use as reference, that would be huge help as well.

8 Years
Discussion Span
Last Post by taminder

You can take a look at this thread for a script that already has some of those abilities programmed in. But there would be a lot more programming to add in. Alternatively there is a script I once used called Sphider which does exactly what you ask but I find hard to edit.

Votes + Comments
Great script, Indeed.

i saw phpcrawler somewhere else but sourceforge was down earlier today.

I also saw your script cwarn but I went to the gym and havn't looked into it yet.

thanks guys. much help

and if anyone else can provide further resources, I would appreciate that as well.

Edited by taminder: n/a

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.