I want to rank sentences in a text based on weights.. for example i want to extract certain important sentences from a text based on its weights(importance).. please help if possible with an executable algorithm...

read sentences
add weights

? ;) Post a bit more information if you don't know how to do one of the two steps above and maybe even a bit of code?

I want to develop a summarization engine in which i need to extract only the important sentences. weights will be given to words based on its frequency of occurrence. and in turn the sum of weights of the individual words will give weighs to the corresponding sentence that i have stored in a linked list. m facing problem in developing this scoring algorithm.

read sentences
add weights

? ;) Post a bit more information if you don't know how to do one of the two steps above and maybe even a bit of code?

So..

Read the sentences
For each sentence
  for each word
    add wordscore to sentencescore

for each sentence
   if setencescore > level, output sentence

Something like that perhaps? Go ahead, code something. It's okay to make mistakes.

Yet another tip: use std::map<std::string,int> to build word dictionary and maintain word counters.

What are you doing with words like a, the, on, and, it, etc. They would seem to contribute an inordinate amount to the score and yet nothing to a sentence's "importance". It's possible they just average out, I guess. Or maybe you could have an ignore list of the so-called functional words of English (articles, conjunctions, prepositions, pronouns, etc).

Keeping in mind ArkM's suggestion to use a map<string,int> to accumulate the word frequencies, here's an algorithm:

Read in sentence file

For each sentence
  For each word
    If not on ignore list  //if you need this
      Increment frequency of that word

For each sentence
  Set sentence weight to the sum of its word weights

Sort sentences by sentence weights.

Store/Display top N sentences.
This article has been dead for over six months. Start a new discussion instead.