Hi all,

Recently I've run into a problem where a string I am reading from a file is being read in with a nonprinting character appended to the end. The character being appended is U+0020.

I'm just unsure how to get rid of this. I know that I could just chop off the last character of the string, but this wouldn't solve my problem because some strings have this Unicode character on the end and others don't.

Any help would be greatly appreciated.



U+0020 is the code for whitespace. So you have a whitespace at the end of your sentences in your file.
Run through the file and remove any whitespace at the end of the sentence. Alternatively, you can create a trim function that removes whites spaces from the front/end.

Though the standard does not say anything specific about the representation of a wchar_t, it is either strictly Unicode (UCS-2) or ISO 10646 (UCS-4) on every implementation. These two have an identical character repertoire and code points for the Basic Multilingual Plane. In practice, this will remove non-printable characters from a string.

void remove_non_printable_chars( std::wstring& wstr )
    // get the ctype facet for wchar_t (Unicode code points in pactice)
    typedef std::ctype< wchar_t > ctype ;
    const ctype& ct = std::use_facet<ctype>( std::locale() ) ;

    // remove non printable Unicode characters
    wstr.erase( std::remove_if( wstr.begin(), wstr.end(),
                    [&ct]( wchar_t ch ) { return !ct.is( ctype::print, ch ) ; } ),
                wstr.end() ) ;

Incidentally, U+0020 is a printable Unicode character.

man you know too much
This article has been dead for over six months. Start a new discussion instead.