I'm looking for an open source webcrawler which can be easily modified. Is anyone aware of a good open source crawler? I'm just enrolled in a dataminning course and would like to develop an algorithm which categorizes web pages. I need to be able to control which pages are indexed and how those pages are categorized, both based on the page content. Thanks.