We're a community of 1077K IT Pros here for help, advice, solutions, professional growth and fun. Join us!
1,076,145 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Start New Discussion Reply to this Discussion

search engine crawling frequency

We want to design a search engine for news websites such as timesofindia.com and indianexpress.com, ie. download various article pages from these websites, index the pages, and answer search queries on the index.

Give a short pseudocode to find an appropriate crawling frequency -- you do not want to crawl too often because the website may not have changed, and you do not want to crawl too infrequently because your index would then be out of date. Assume that your crawling code looks as follows:

while(1) {
     sleep(sleep_interval); // sleep for sleep_interval
     crawl(website);        // crawls the entire website

     // returns a % value of difference between the latest and previous crawls of the website
     diff = diff(currently_crawled_website, previously_crawled_website);

     sleep_interval = infer_sleep_interval(diff, sleep_interval); // this is the method you have to write!
}

Give a pseudocode for the infer_sleep_interval method:

long sleep_interval infer_sleep_interval(int diff_percentage, long previous_sleep_interval) {
    ...
    ...
    ...
}

Design a method which adaptively alters the sleeping interval based on the update frequency of the website.

2
Contributors
1
Reply
3 Days
Discussion Span
11 Months Ago
Last Updated
2
Views
aditya2313
Newbie Poster
1 post since May 2012
Reputation Points: 0
Solved Threads: 0
Skill Endorsements: 0

Hi,

I'm onfused what your question is? Is this a homework assignment? Please show effort and post your question, along with what you've tried before, in the C++ forum.

Dani
The Queen of DaniWeb
Administrator
21,344 posts since Feb 2002
Reputation Points: 1,555
Solved Threads: 367
Skill Endorsements: 122

This article has been dead for over three months: Start a new discussion instead

Post: Markdown Syntax: Formatting Help
 
You
View similar articles that have also been tagged:
 
© 2013 DaniWeb® LLC
Page rendered in 0.0575 seconds using 2.68MB