RSS Forums RSS

extracting a sentence

Please support our C++ advertiser: Programming Forums
Reply
Posts: 7
Reputation: serhannn is an unknown quantity at this point 
Solved Threads: 0
serhannn serhannn is offline Offline
Newbie Poster

extracting a sentence

  #1  
Jan 11th, 2009
Hello everybody,

I have a paragraph containing many sentences and I need to extract a certain sentence, which contains a certain. I have no problem finding the word with my code but my question is, how can I extract the whole sentence? The code should work, regardless of the position of the word in the sentence. Here I give an example:

A new crisis is emerging, a global food catastrophe that will reach further and be more crippling than anything the world has ever seen. The credit crunch and the reverberations of soaring oil prices around the world will pale in comparison to what is about to transpire, Donald Coxe, global portfolio strategist at BMO Financial Group said at the Empire Club's 14th annual investment outlook in Toronto on Thursday.

For instance, I search for the word catastrophe in this text and I find it. Now I need to extract the sentence:"A new crisis is emerging, a global food catastrophe that will reach further and be more crippling than anything the world has ever seen. "
I thought that I could use string.find until the characters ". " to get to the end of the sentence, but I also need to retrieve the part which comes before the word catastrophe. I would appreciate your ideas and help over this topic.

Thank you very much.
AddThis Social Bookmark Button
Reply With Quote  
Posts: 1,223
Reputation: ddanbe is just really nice ddanbe is just really nice ddanbe is just really nice ddanbe is just really nice 
Solved Threads: 163
ddanbe's Avatar
ddanbe ddanbe is online now Online
Nearly a Posting Virtuoso

Re: extracting a sentence

  #2  
Jan 11th, 2009
Proceed in the same way. Find the previous '.', then move forward until the first capital.
"If you judge people, you have no time to love them." Mother Teresa
Make love, no war. Cave ab homine unius libri.
First rule of debugging: "If you get a different error message, you're making progress."
Danny
Reply With Quote  
Posts: 7
Reputation: serhannn is an unknown quantity at this point 
Solved Threads: 0
serhannn serhannn is offline Offline
Newbie Poster

Re: extracting a sentence

  #3  
Jan 11th, 2009
Originally Posted by ddanbe View Post
Proceed in the same way. Find the previous '.', then move forward until the first capital.

But I can't decide whether it's beginning of a sentence looking at the capital. It can also be a name or another things.
Reply With Quote  
Posts: 1,223
Reputation: ddanbe is just really nice ddanbe is just really nice ddanbe is just really nice ddanbe is just really nice 
Solved Threads: 163
ddanbe's Avatar
ddanbe ddanbe is online now Online
Nearly a Posting Virtuoso

Re: extracting a sentence

  #4  
Jan 11th, 2009
Come to think of it, it is not evident.
What would you do with a sentence like: This person holds a Ph.D., but this is not really a C++ problem any more.
"If you judge people, you have no time to love them." Mother Teresa
Make love, no war. Cave ab homine unius libri.
First rule of debugging: "If you get a different error message, you're making progress."
Danny
Reply With Quote  
Posts: 7
Reputation: serhannn is an unknown quantity at this point 
Solved Threads: 0
serhannn serhannn is offline Offline
Newbie Poster

Re: extracting a sentence

  #5  
Jan 11th, 2009
In a loop, I used this code to start from the found word and take until the first encounter with a dot. But I think there's something wrong with it, because in output there are some errors. I search in text files, which are actually source codes of some webpages, so they contain many HTML tags. They also interrupt. Is there any better algorithm to avoid that?

size_t pos1,pos2;
pos1=line.find(keyword_vector[i]); 
pos2=line.find(".",pos1); 	
string sentence = line.substr(pos1,pos2); 					

Thanks for help.
Last edited by serhannn : Jan 11th, 2009 at 10:56 am.
Reply With Quote  
Posts: 665
Reputation: Freaky_Chris is a jewel in the rough Freaky_Chris is a jewel in the rough Freaky_Chris is a jewel in the rough 
Solved Threads: 107
Freaky_Chris's Avatar
Freaky_Chris Freaky_Chris is offline Offline
Practically a Master Poster

Re: extracting a sentence

  #6  
Jan 11th, 2009
Perhaps you should read the entire file in, strip out all html tags as you do so, that way you are left with just text. The go over and process what you have for sentences. But as ddanbe said, this isn't really a job for C++

The closest you will get to checking for the start and end of a sentence is [dot][space][capital] you will need to use that to check for both the beginning and end.

Chris

Chris
Knowledge is power -- But experience is everything
Reply With Quote  
Posts: 2,404
Reputation: Comatose is a jewel in the rough Comatose is a jewel in the rough Comatose is a jewel in the rough Comatose is a jewel in the rough 
Solved Threads: 209
Colleague
Comatose's Avatar
Comatose Comatose is offline Offline
Taboo Programmer

Re: extracting a sentence

  #7  
Jan 11th, 2009
You still have to take into account that reading in a file has newlines. So, what if my sentence is nearing the end of the line, and I need to break it into the next line? If "word wrap" isn't used, but instead manual line breaks, sentences could be span on multiple lines :/
Reply With Quote  
Reply

Only community members can participate in forum threads. You must register or log in to contribute.



Views: 503 | Replies: 6 | Currently Viewing: 1 (0 members and 1 guests)

 

Thread Tools Display Modes
Forums | Blogs | Tutorials | Code Snippets | Whitepapers | RSS Feeds | Advertising
All times are GMT -4. The time now is 2:48 pm.
Newsletter Archive - Sitemap - Privacy Statement - Acceptable Use Policy - Contact Us
Forum system based on vBulletin Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC