Perhaps you should read the entire file in, strip out all html tags as you do so, that way you are left with just text. The go over and process what you have for sentences. But as ddanbe said, this isn't really a job for C++
The closest you will get to checking for the start and end of a sentence is [dot][space][capital] you will need to use that to check for both the beginning and end.
Chris
Chris