Building a simple Java web crawler

Closed Thread

Join Date: Apr 2005
Posts: 14
Reputation: kooben is an unknown quantity at this point 
Solved Threads: 0
kooben kooben is offline Offline
Newbie Poster

Building a simple Java web crawler

 
0
  #1
May 3rd, 2005
Hi

I intend over the next few months to learnt Java with the purpose of building my own simple web crawler/spider. I have seen a few open source spiders but would like to build my own if possible.

What I would like to ask is how would I go about learning java and also would the building of a simple spider be very hard?

My requirements of the spider are as follows:

Go to the entered URL and gather all content from the site
Collect link structure

The app I am developing will need to be able to build a structured sitemap of the specified URL.

One final note is how would I go about building a browser add-on? What languages can they be built in and which browser is best/easiest to develop for?

Thanks
Quick reply to this message  
Join Date: Mar 2005
Posts: 25
Reputation: Black Knight is an unknown quantity at this point 
Solved Threads: 0
Black Knight's Avatar
Black Knight Black Knight is offline Offline
Light Poster

Re: Building a simple Java web crawler

 
0
  #2
May 3rd, 2005
I have built a java web crawler/spider before with a front end resembling google for a previous uni project and I would say it is a moderate program to try and do, not overly difficult but a definate challenge for a new java coder.

Some of the main bits you will need to learn to do this is iostreams to read the urls in and JDBC so that you can store the data(you could do it by reading into an array/vector but i wouldnt recommend it as it would eat memory).

There is loads on the web about spider methods and algorithms like word ranking etc but i am sure you have already read up about how they work.

It is probobly quite a good project as you could make it on the command line and then redo it with a gui later if you wanted to.

As for browser plugins I would probobly go for a firefox plugin but then again why stop at a search engine, why not build your own browser too. :mrgreen:
01001001011001100010000001111001011011110111010100100000011000110110000101101110
00100000011100100110010101100001011001000010000001110100011010000110100101110011
00100000011110010110111101110101001000000110111001100101011001010110010000100000
0110100001100101011011000111000000101110
Quick reply to this message  
Join Date: Jun 2004
Posts: 2,108
Reputation: server_crash is on a distinguished road 
Solved Threads: 18
server_crash server_crash is offline Offline
Postaholic

Re: Building a simple Java web crawler

 
0
  #3
May 3rd, 2005
I think the java.sun site had a tutorial on creating one of these. This is actually the next project I want to take up!
Quick reply to this message  
Join Date: Aug 2005
Posts: 8
Reputation: Dark Master is an unknown quantity at this point 
Solved Threads: 0
Dark Master Dark Master is offline Offline
Newbie Poster

Re: Building a simple Java web crawler

 
0
  #4
Sep 20th, 2005
hi black knight, i m also building a web crawler in java as a project work.can u giude me?i m new to java.
Quick reply to this message  
Join Date: Jun 2004
Posts: 609
Reputation: freesoft_2000 is an unknown quantity at this point 
Solved Threads: 7
freesoft_2000 freesoft_2000 is offline Offline
Practically a Master Poster

Re: Building a simple Java web crawler

 
0
  #5
Sep 20th, 2005
Hi everyone,

This is a topic i created at wizard solutions that has the entire source codes and extensive explanations on creating your own webcrawlers using java.

Click on the links on that post

Here is the link

http://www.wizardsolutionsusa.com/fo...hread.php?t=29

Richard West
Microsoft uses "One World, One Web, One Program" as a slogan.
Doesn’t that sound like "Ein Volk, Ein Reich, Ein Führer" to you, too?
— Eric S. Raymond

Tell me what type of software do you like and what would you pay for it

http://www.daniweb.com/techtalkforums/thread19660.html
Quick reply to this message  
Join Date: Jun 2008
Posts: 1
Reputation: shubh_9797 is an unknown quantity at this point 
Solved Threads: 0
shubh_9797 shubh_9797 is offline Offline
Newbie Poster

Re: Building a simple Java web crawler

 
-2
  #6
Feb 18th, 2009
can any one help me out i want to build a web crawler
Quick reply to this message  
Join Date: May 2007
Posts: 4,346
Reputation: Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of 
Solved Threads: 498
Moderator
Featured Poster
Ezzaral's Avatar
Ezzaral Ezzaral is offline Offline
Industrious Poster

Re: Building a simple Java web crawler

 
0
  #7
Feb 18th, 2009
Only if you start a new thread for your request and demonstrate that you have made some effort on your own.
Quick reply to this message  
Join Date: Sep 2009
Posts: 1
Reputation: giaBaloch is an unknown quantity at this point 
Solved Threads: 0
giaBaloch giaBaloch is offline Offline
Newbie Poster

Re: Building a simple Java web crawler

 
0
  #8
Sep 26th, 2009
Originally Posted by freesoft_2000 View Post
Hi everyone,

This is a topic i created at wizard solutions that has the entire source codes and extensive explanations on creating your own webcrawlers using java.

Click on the links on that post

Here is the link

http://www.wizardsolutionsusa.com/fo...hread.php?t=29

Richard West
i am facing problem in accessing the link:
http://www.wizardsolutionsusa.com/fo...hread.php?t=29
i also need the information placed on this page.
Quick reply to this message  
Join Date: Nov 2009
Posts: 2
Reputation: shelley7753 is an unknown quantity at this point 
Solved Threads: 0
shelley7753 shelley7753 is offline Offline
Newbie Poster

i also cannot get link to work

 
0
  #9
23 Hours Ago
that link doesn't work...... and i can't find wizard solutions website, to try to join or whatever... how do you get to wizard solutions
Quick reply to this message  
Join Date: May 2007
Posts: 4,346
Reputation: Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of Ezzaral has much to be proud of 
Solved Threads: 498
Moderator
Featured Poster
Ezzaral's Avatar
Ezzaral Ezzaral is offline Offline
Industrious Poster
 
0
  #10
19 Hours Ago
The post is over four years old. Not everything on the web persists forever.
Quick reply to this message  
Closed Thread

Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC