| | |
Building a simple Java web crawler
![]() |
•
•
Join Date: Apr 2005
Posts: 14
Reputation:
Solved Threads: 0
Hi
I intend over the next few months to learnt Java with the purpose of building my own simple web crawler/spider. I have seen a few open source spiders but would like to build my own if possible.
What I would like to ask is how would I go about learning java and also would the building of a simple spider be very hard?
My requirements of the spider are as follows:
Go to the entered URL and gather all content from the site
Collect link structure
The app I am developing will need to be able to build a structured sitemap of the specified URL.
One final note is how would I go about building a browser add-on? What languages can they be built in and which browser is best/easiest to develop for?
Thanks
I intend over the next few months to learnt Java with the purpose of building my own simple web crawler/spider. I have seen a few open source spiders but would like to build my own if possible.
What I would like to ask is how would I go about learning java and also would the building of a simple spider be very hard?
My requirements of the spider are as follows:
Go to the entered URL and gather all content from the site
Collect link structure
The app I am developing will need to be able to build a structured sitemap of the specified URL.
One final note is how would I go about building a browser add-on? What languages can they be built in and which browser is best/easiest to develop for?
Thanks
I have built a java web crawler/spider before with a front end resembling google for a previous uni project and I would say it is a moderate program to try and do, not overly difficult but a definate challenge for a new java coder.
Some of the main bits you will need to learn to do this is iostreams to read the urls in and JDBC so that you can store the data(you could do it by reading into an array/vector but i wouldnt recommend it as it would eat memory).
There is loads on the web about spider methods and algorithms like word ranking etc but i am sure you have already read up about how they work.
It is probobly quite a good project as you could make it on the command line and then redo it with a gui later if you wanted to.
As for browser plugins I would probobly go for a firefox plugin but then again why stop at a search engine, why not build your own browser too. :mrgreen:
Some of the main bits you will need to learn to do this is iostreams to read the urls in and JDBC so that you can store the data(you could do it by reading into an array/vector but i wouldnt recommend it as it would eat memory).
There is loads on the web about spider methods and algorithms like word ranking etc but i am sure you have already read up about how they work.
It is probobly quite a good project as you could make it on the command line and then redo it with a gui later if you wanted to.
As for browser plugins I would probobly go for a firefox plugin but then again why stop at a search engine, why not build your own browser too. :mrgreen:
01001001011001100010000001111001011011110111010100100000011000110110000101101110
00100000011100100110010101100001011001000010000001110100011010000110100101110011
00100000011110010110111101110101001000000110111001100101011001010110010000100000
0110100001100101011011000111000000101110
00100000011100100110010101100001011001000010000001110100011010000110100101110011
00100000011110010110111101110101001000000110111001100101011001010110010000100000
0110100001100101011011000111000000101110
•
•
Join Date: Jun 2004
Posts: 609
Reputation:
Solved Threads: 7
Hi everyone,
This is a topic i created at wizard solutions that has the entire source codes and extensive explanations on creating your own webcrawlers using java.
Click on the links on that post
Here is the link
http://www.wizardsolutionsusa.com/fo...hread.php?t=29
Richard West
This is a topic i created at wizard solutions that has the entire source codes and extensive explanations on creating your own webcrawlers using java.
Click on the links on that post
Here is the link
http://www.wizardsolutionsusa.com/fo...hread.php?t=29
Richard West
Microsoft uses "One World, One Web, One Program" as a slogan.
Doesn’t that sound like "Ein Volk, Ein Reich, Ein Führer" to you, too?
— Eric S. Raymond
Tell me what type of software do you like and what would you pay for it
http://www.daniweb.com/techtalkforums/thread19660.html
Doesn’t that sound like "Ein Volk, Ein Reich, Ein Führer" to you, too?
— Eric S. Raymond
Tell me what type of software do you like and what would you pay for it
http://www.daniweb.com/techtalkforums/thread19660.html
•
•
Join Date: Sep 2009
Posts: 1
Reputation:
Solved Threads: 0
•
•
•
•
Hi everyone,
This is a topic i created at wizard solutions that has the entire source codes and extensive explanations on creating your own webcrawlers using java.
Click on the links on that post
Here is the link
http://www.wizardsolutionsusa.com/fo...hread.php?t=29
Richard West
http://www.wizardsolutionsusa.com/fo...hread.php?t=29
i also need the information placed on this page.
![]() |
Similar Threads
- Looking for examples of Java Web Applications (Java)
- accessing redirected pages through java crawler (Java)
- making a web crawler (Java)
- Web Crawler (Java)
- The best Web Browser Ever (Mac Software)
Other Threads in the Java Forum
- Previous Thread: linked list polynomial help with add Term?
- Next Thread: online exam project on java
| Thread Tools | Search this Thread |
-xlint actionlistener add android applet application array automation bank bi binary blackberry block bluetooth character class client code compile compiler component consumer database desktop developmenthelp eclipse equation error event fractal ftp functiontesting game gameprogramming givemetehcodez graphics gui health html hyper idea image infinite int j2me j2seprojects java javac javaee javame javaprojects jetbrains jni jpanel jtable julia learningresources lego linked linux mac main method mobile myregfun netbeans notdisplaying number online pearl printf problem program qt researchinmotion rotatetext rsa scanner screen server set singleton sms sort spamblocker sql string swing system textfields thread threads time title tree tutorial-sample update variablebinding windows xor





