hey, so for my first project in my class, I need to build a crawler
this is what I need it to do:

- crawl and follow all urls on a domain (not go out into other websites)
- get the title of the page
- get the meta tag keywords of the page
- get the meta tag description of the page
- get the url of the page
- store all of this information in a MySQL database
- then follow the url's on the page and do the same thing on those pages

- if possible, I would also like to get the full index of the page and store it into the MySQL database as well (we are going to make a script for our next project to search for keywords of what we crawled, but I can do that myself after I have the data).

I'm not planning to create a public website or anything, just something for private testing. if you cannot help with this but you know of other scripts with similar features to use as reference, that would be huge help as well.

Recommended Answers

All 3 Replies

Take a look at php crowler, an open source project. I think it will help you to understand the basic features of crawler.

You can take a look at this thread for a script that already has some of those abilities programmed in. But there would be a lot more programming to add in. Alternatively there is a script I once used called Sphider which does exactly what you ask but I find hard to edit.

commented: Great script, Indeed. +6

i saw phpcrawler somewhere else but sourceforge was down earlier today.

I also saw your script cwarn but I went to the gym and havn't looked into it yet.

thanks guys. much help

and if anyone else can provide further resources, I would appreciate that as well.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.