I would like to create a crawler using java with mySQL as backend. It must work in such a way that it will crawl to a site and grab the data from that site. And then store that data in the database.
So can anybody tell me a way to start this project and how to go forward.

Recommended Answers

All 3 Replies

Well, the first place to start would be here at the tutorial on networking: http://java.sun.com/docs/books/tutorial/networking/index.html

As a basic outline, you would need to maintain a collection of URLs to visit, connect to each URL and retrieve the content, parse the content and store what you want in the database, then move along to the next URL.

The Art Of Java has an example of a web crawler, but it doesn't go into mySQL.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, learning, and sharing knowledge.