hello..
I m designing a STATISTICAL SUMMARIZER in C..
This software will create the summary of a text file submitted by
the user..
The first step is to separate sentences in the text file..
I hav almost completed this step..
but there are some limitations..
Suppose if text file hav this input..e.g

Hello! This is MAYANK. I am a Student of "nitk."
Working on a project.

output:
Hello! This is MAYANK.
I am a Student of "nitk."
Working on a project.


then how i will separate these lines into sentences??
can u provide me with a code of sentence separation..

Recommended Answers

All 11 Replies

hey its easy try it.

read characters from file in a buffer until u encounter a dot(.) save that buffer which is your sentence.

What say...?

hey its easy try it.

read characters from file in a buffer until u encounter a dot(.) save that buffer which is your sentence.

What say...?

But what about other sentence separators such as ? and ! How about a sentence that spans lines.

In the case of the second sentence, it is not terminated by . but by quotes "

hey that was the hint, mayank can move forward with this.

your example is wrong, according to your own definition. if you want to separate sentences, then you need to also break on the exclamation point

[b][u]output:[/b][/u]
        Hello! 
        This is MAYANK.
        I am a Student of "nitk."
        Working on a project.

anyhow... time to repeat my earlier advice:

the function that would be helpful for you here, i think, is STRTOK

in the string.h library.

it will separate any text into "tokens" based on a set of "delimiters"

your "tokens" will be the individual sentences, and your set of "delimiters" will be the set of all sentence-ending punctuation.

once you have your individual lines you can do what you want with them ... such as insert newlines between them and rewrite them to another text file

http://www.cplusplus.com/reference/c...ng/strtok.html

CAVEAT: you will need extra logic -- in addition to the STRTOK -- to handle the event of end-of-sentence punctuation found inside a quotation mark.


.

your example is wrong, according to your own definition. if you want to separate sentences, then you need to also break on the exclamation point

[b][u]output:[/b][/u]
        Hello! 
        This is MAYANK.
        I am a Student of "nitk."
        Working on a project.

anyhow... time to repeat my earlier advice:

the function that would be helpful for you here, i think, is STRTOK

in the string.h library.

it will separate any text into "tokens" based on a set of "delimiters"

your "tokens" will be the individual sentences, and your set of "delimiters" will be the set of all sentence-ending punctuation.

once you have your individual lines you can do what you want with them ... such as insert newlines between them and rewrite them to another text file

http://www.cplusplus.com/reference/c...ng/strtok.html

CAVEAT: you will need extra logic -- in addition to the STRTOK -- to handle the event of end-of-sentence punctuation found inside a quotation mark.


.

hey..
thnks for the rply..
i hav created sentence seperator..
bt it fails in many cases like

Hey! How are u. Wats going on
these days. Where is Mr. Kapil's brother
these days.

how u wil separate sentence if there is
exclamation mark in it..
a sentence may or may not end with exclamation mark..
n further if a sentence ending with double quotes n also hv quotes within quotes..
n a sentence having abb. like Mr.

There is no general rules, you have to take care of these exceptions.

You have to Develop some sort of algo to handle these exceptions.

At first sight it seems easy but things are getting complex here.

difficulty goes on increasing as you go deep in it...

how u wil separate sentence if there is
exclamation mark in it..
a sentence may or may not end with exclamation mark..
n further if a sentence ending with double quotes n also hv quotes within quotes..
n a sentence having abb. like Mr.

And run-on sentences that have no ending punctuation marks.

how u wil separate sentence if there is
exclamation mark in it..
a sentence may or may not end with exclamation mark..

i gave you the answer.

the answer is STRTOK

if you dont want that answer, then you're more than welcome to go roll your own version of a string tokenizer.

Hey singal.mayank, well what i can see is that you know what you want, but why don't you try implement them. You know that sentence can be delimited by the multiple chars. . ! and there are many other which need to considered as well.

Jeph has already give an function which can tokenise the string. May be you should see if the following helps you out.

1. Make a list of char which can deliminate a sentence
2. Use a string tokeniser function to find if there are any above list of deliminator char
3. If so token the string and store them in a buffer or print as you where doing

An alternative solution for strtok is to use fgets and ssanf functions, which i personal prefer use them.

ssharish

I dont think you're going to use sscanf very easily on strings that can be wildly divergent from any imaginable standard format.


.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.