944,159 Members | Top Members by Rank

Ad:
  • Java Discussion Thread
  • Unsolved
  • Views: 33486
  • Java RSS
You are currently viewing page 1 of this multi-page discussion thread
May 3rd, 2005
0

Building a simple Java web crawler

Expand Post »
Hi

I intend over the next few months to learnt Java with the purpose of building my own simple web crawler/spider. I have seen a few open source spiders but would like to build my own if possible.

What I would like to ask is how would I go about learning java and also would the building of a simple spider be very hard?

My requirements of the spider are as follows:

Go to the entered URL and gather all content from the site
Collect link structure

The app I am developing will need to be able to build a structured sitemap of the specified URL.

One final note is how would I go about building a browser add-on? What languages can they be built in and which browser is best/easiest to develop for?

Thanks
Similar Threads
Reputation Points: 10
Solved Threads: 0
Newbie Poster
kooben is offline Offline
14 posts
since Apr 2005
May 3rd, 2005
0

Re: Building a simple Java web crawler

I have built a java web crawler/spider before with a front end resembling google for a previous uni project and I would say it is a moderate program to try and do, not overly difficult but a definate challenge for a new java coder.

Some of the main bits you will need to learn to do this is iostreams to read the urls in and JDBC so that you can store the data(you could do it by reading into an array/vector but i wouldnt recommend it as it would eat memory).

There is loads on the web about spider methods and algorithms like word ranking etc but i am sure you have already read up about how they work.

It is probobly quite a good project as you could make it on the command line and then redo it with a gui later if you wanted to.

As for browser plugins I would probobly go for a firefox plugin but then again why stop at a search engine, why not build your own browser too. :mrgreen:
Reputation Points: 10
Solved Threads: 0
Light Poster
Black Knight is offline Offline
25 posts
since Mar 2005
May 3rd, 2005
0

Re: Building a simple Java web crawler

I think the java.sun site had a tutorial on creating one of these. This is actually the next project I want to take up!
Reputation Points: 113
Solved Threads: 19
Postaholic
server_crash is offline Offline
2,108 posts
since Jun 2004
Sep 20th, 2005
0

Re: Building a simple Java web crawler

hi black knight, i m also building a web crawler in java as a project work.can u giude me?i m new to java.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
Dark Master is offline Offline
8 posts
since Aug 2005
Sep 20th, 2005
0

Re: Building a simple Java web crawler

Hi everyone,

This is a topic i created at wizard solutions that has the entire source codes and extensive explanations on creating your own webcrawlers using java.

Click on the links on that post

Here is the link

http://www.wizardsolutionsusa.com/fo...hread.php?t=29

Richard West
Reputation Points: 25
Solved Threads: 10
Practically a Master Poster
freesoft_2000 is offline Offline
623 posts
since Jun 2004
Feb 18th, 2009
-2

Re: Building a simple Java web crawler

can any one help me out i want to build a web crawler
Reputation Points: 8
Solved Threads: 0
Newbie Poster
shubh_9797 is offline Offline
1 posts
since Jun 2008
Feb 18th, 2009
0

Re: Building a simple Java web crawler

Only if you start a new thread for your request and demonstrate that you have made some effort on your own.
Moderator
Featured Poster
Reputation Points: 3239
Solved Threads: 839
Posting Genius
Ezzaral is offline Offline
6,761 posts
since May 2007
Sep 26th, 2009
0

Re: Building a simple Java web crawler

Hi everyone,

This is a topic i created at wizard solutions that has the entire source codes and extensive explanations on creating your own webcrawlers using java.

Click on the links on that post

Here is the link

http://www.wizardsolutionsusa.com/fo...hread.php?t=29

Richard West
i am facing problem in accessing the link:
http://www.wizardsolutionsusa.com/fo...hread.php?t=29
i also need the information placed on this page.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
giaBaloch is offline Offline
1 posts
since Sep 2009
Nov 7th, 2009
0

i also cannot get link to work

that link doesn't work...... and i can't find wizard solutions website, to try to join or whatever... how do you get to wizard solutions
Reputation Points: 10
Solved Threads: 0
Newbie Poster
shelley7753 is offline Offline
2 posts
since Nov 2009
Nov 7th, 2009
0
Re: Building a simple Java web crawler
The post is over four years old. Not everything on the web persists forever.
Moderator
Featured Poster
Reputation Points: 3239
Solved Threads: 839
Posting Genius
Ezzaral is offline Offline
6,761 posts
since May 2007

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
This thread is currently closed and is not accepting any new replies.
Previous Thread in Java Forum Timeline: linked list polynomial help with add Term?
Next Thread in Java Forum Timeline: online exam project on java





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC