U+0020 is the code for whitespace. So you have a whitespace at the end of your sentences in your file.
Run through the file and remove any whitespace at the end of the sentence. Alternatively, you can create a trim function that removes whites spaces from the front/end.
firstPerson
Senior Poster
3,923 posts since Dec 2008
Reputation Points: 841
Solved Threads: 608
Though the standard does not say anything specific about the representation of a wchar_t, it is either strictly Unicode (UCS-2) or ISO 10646 (UCS-4) on every implementation. These two have an identical character repertoire and code points for the Basic Multilingual Plane. In practice, this will remove non-printable characters from a string.
void remove_non_printable_chars( std::wstring& wstr )
{
// get the ctype facet for wchar_t (Unicode code points in pactice)
typedef std::ctype< wchar_t > ctype ;
const ctype& ct = std::use_facet<ctype>( std::locale() ) ;
// remove non printable Unicode characters
wstr.erase( std::remove_if( wstr.begin(), wstr.end(),
[&ct]( wchar_t ch ) { return !ct.is( ctype::print, ch ) ; } ),
wstr.end() ) ;
}
Incidentally, U+0020 is a printable Unicode character.
vijayan121
Posting Virtuoso
1,606 posts since Dec 2006
Reputation Points: 1,159
Solved Threads: 287