| | |
Google Programming Searchengine
Please support our Computer Science advertiser: Learn about neural networks and artificial intelligence.
![]() |
•
•
Join Date: Nov 2004
Posts: 36
Reputation:
Solved Threads: 0
Howdy,
I want to know how the google, webcrawler etc. searchengines really work.
I have read around 10 websites, found on google, about “how searchengines work� and not a single one of them make it clear if it is the spider or the index or the search software does the ranking according to it’s ranking algorithm.
All they ever say is that, a searchengine has 3 softwares :
a) the spider
b) the index
c) the search system (search-box, template, etc.)
The spiders crawl the web collecting webpages and then forward them to the index and then the search software searches the index for the sought keywords/phrases.
Also, some say that the spiders copy the whole website into it’s index. So, in other words, there is 2 copies of a website. One residing in the website owner’s webserver and the other residing on the index of the searchengine.
So now, I can only assume 3 possibilities how a searchengine works from all this:
1.
The spider does not do the ranking according to any algorithm.
All it does is visit a website, grab all it’s html codes (copy a website) and then dump the html codes to it’s index.
The Index is nothing but a big txt file (.txt, .html) on the searchengine’s webserver that keeps full copy (html codes) of each website.
The search-system, when searching and finding links (in the index) gives the ranking according to the searchengine’s ranking algorithm.
This means, the spider nor the index is responsible for the ranking because these 2 parts of the searchengine are not taught the ranking algorithm.
OR
2.
The spider does the ranking according to the searchengine’s ranking algorithm.
It visits a website and grabs all it’s html codes (copy a website) and then finally dump the html codes to it’s index. When it dumps the copies of websites it ranks them according to the searchengine’s algorithm.
The Index is nothing but a big txt file (.txt, .html) on the searchengine’s webserver that keeps full copy (html codes) of each website.
The search-system, when searching and finding links (in the index) does not give the ranking according to the searchengine’s ranking algorithm because that has been already done by the spider when dumping the data onto the index.
This means, the spider is responsible for giving the ranking and not the index nor the search-system responsible for the ranking because these 2 parts of the searchengine are not taught the ranking algorithm.
OR
3.
The spider does not do the ranking according to any algorithm.
All it does is visit a website, grab all it’s html codes (copy a website) and then dump the html codes to it’s index.
The Index is not only a big txt file (.txt, .html) on the searchengine’s webserver that keeps full copy (html codes) of each website but also the system that does the ranking.
When it receives data from the spider, it ranks the links in it’s database according to the searchengine’s ranking algorithm.
The search-system, when searching and finding links (in the index) does not give the ranking according to the searchengine’s ranking algorithm.
Frankly, all it does is output a copy of certain parts of the index onto a searcher’s screen.
This means, neither the spider or the search-system is responsible for the ranking because these 2 parts of the searchengine are not taught the ranking algorithm.
So, which assumption is correct according to the 3 above ?
I want to know how the google, webcrawler etc. searchengines really work.
I have read around 10 websites, found on google, about “how searchengines work� and not a single one of them make it clear if it is the spider or the index or the search software does the ranking according to it’s ranking algorithm.
All they ever say is that, a searchengine has 3 softwares :
a) the spider
b) the index
c) the search system (search-box, template, etc.)
The spiders crawl the web collecting webpages and then forward them to the index and then the search software searches the index for the sought keywords/phrases.
Also, some say that the spiders copy the whole website into it’s index. So, in other words, there is 2 copies of a website. One residing in the website owner’s webserver and the other residing on the index of the searchengine.
So now, I can only assume 3 possibilities how a searchengine works from all this:
1.
The spider does not do the ranking according to any algorithm.
All it does is visit a website, grab all it’s html codes (copy a website) and then dump the html codes to it’s index.
The Index is nothing but a big txt file (.txt, .html) on the searchengine’s webserver that keeps full copy (html codes) of each website.
The search-system, when searching and finding links (in the index) gives the ranking according to the searchengine’s ranking algorithm.
This means, the spider nor the index is responsible for the ranking because these 2 parts of the searchengine are not taught the ranking algorithm.
OR
2.
The spider does the ranking according to the searchengine’s ranking algorithm.
It visits a website and grabs all it’s html codes (copy a website) and then finally dump the html codes to it’s index. When it dumps the copies of websites it ranks them according to the searchengine’s algorithm.
The Index is nothing but a big txt file (.txt, .html) on the searchengine’s webserver that keeps full copy (html codes) of each website.
The search-system, when searching and finding links (in the index) does not give the ranking according to the searchengine’s ranking algorithm because that has been already done by the spider when dumping the data onto the index.
This means, the spider is responsible for giving the ranking and not the index nor the search-system responsible for the ranking because these 2 parts of the searchengine are not taught the ranking algorithm.
OR
3.
The spider does not do the ranking according to any algorithm.
All it does is visit a website, grab all it’s html codes (copy a website) and then dump the html codes to it’s index.
The Index is not only a big txt file (.txt, .html) on the searchengine’s webserver that keeps full copy (html codes) of each website but also the system that does the ranking.
When it receives data from the spider, it ranks the links in it’s database according to the searchengine’s ranking algorithm.
The search-system, when searching and finding links (in the index) does not give the ranking according to the searchengine’s ranking algorithm.
Frankly, all it does is output a copy of certain parts of the index onto a searcher’s screen.
This means, neither the spider or the search-system is responsible for the ranking because these 2 parts of the searchengine are not taught the ranking algorithm.
So, which assumption is correct according to the 3 above ?
I would like to come-up with my own "Compression Algorithm" and teach that to the browsers so you can now show streaming videos and lengthy animations from your website without losing an arm and a leg on your band-width.
well a google bomb is when you go around putting post and blogs ect every where with a link that says somthing like stupid such as in my thread it isnt very effective but when you submit your site to google the way to get to the top is to get as many links to it as possible but you have to submit your site to there first and i think it picks up all the little branches of the web site
PETA People for the Eating of Tasty Animals.
FireFox
Hijack This
Ad-Aware
Hijack this tutorial
Microsoft AntiSpyware
CompUchat
FireFox
Hijack This
Ad-Aware
Hijack this tutorial
Microsoft AntiSpyware
CompUchat
•
•
Join Date: Nov 2004
Posts: 36
Reputation:
Solved Threads: 0
•
•
•
•
Originally Posted by OurNation
well a google bomb is when you go around putting post and blogs ect every where with a link that says somthing like stupid such as in my thread it isnt very effective but when you submit your site to google the way to get to the top is to get as many links to it as possible but you have to submit your site to there first and i think it picks up all the little branches of the web site
"well a google bomb is when you go around putting post and blogs ect every where with a link that says somthing like stupid such as in my thread it isnt very effective"
so better explain properly. :rolleyes:
I would like to come-up with my own "Compression Algorithm" and teach that to the browsers so you can now show streaming videos and lengthy animations from your website without losing an arm and a leg on your band-width.
![]() |
Similar Threads
- huffman code (C++)
- News Story: Google Base Data API good news for Blogger bloggers (XML, XSLT and XPATH)
- News Story: Google Code Jam 2006 (Pay-Per-Click Advertising)
Other Threads in the Computer Science Forum
- Previous Thread: Round Robin HELP!
- Next Thread: want to write latin translator what language should i choose
| Thread Tools | Search this Thread |
ai algorithm algorithms amazon assignment assignmenthelp automata battery binary bittorrent bizarre bletchleypark blogging bomb business cern codebreaker compiler computer computers computerscience computertrackingsoftware connect conversion csc data dataanalysis dataintepretation development dfa dissertation dissertations dissertationthesis dissertationtopic ebook employment energy extensions floatingpoint foreclosure foreclosuresoftware fuel gadgets geeks givemetehcodez government graphics hardware history homeowners homeworkassignment humor ibm idea ideas internet iphone ipod itcontracts jobs kindle laser laws lsmeans marketing mining mobileapplication msaccess nano netbeans networking news os piracy piratebay principles programming rasterizer research sam-being-cute sas science security sex spying sql stephenfry study supercomputer supercomputing sweden technology textfield turing turingtest two'scompliment uk virus warehouse ww2





