954,504 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

How to get rid of Unicode nonprinting characters

Hi all,

Recently I've run into a problem where a string I am reading from a file is being read in with a nonprinting character appended to the end. The character being appended is U+0020.

I'm just unsure how to get rid of this. I know that I could just chop off the last character of the string, but this wouldn't solve my problem because some strings have this Unicode character on the end and others don't.

Any help would be greatly appreciated.

Thanks,

Dylan

dboltz03
Newbie Poster
2 posts since Oct 2010
Reputation Points: 10
Solved Threads: 0
 

U+0020 is the code for whitespace. So you have a whitespace at the end of your sentences in your file.
Run through the file and remove any whitespace at the end of the sentence. Alternatively, you can create a trim function that removes whites spaces from the front/end.

firstPerson
Senior Poster
3,923 posts since Dec 2008
Reputation Points: 841
Solved Threads: 608
 

Though the standard does not say anything specific about the representation of a wchar_t, it is either strictly Unicode (UCS-2) or ISO 10646 (UCS-4) on every implementation. These two have an identical character repertoire and code points for the Basic Multilingual Plane. In practice, this will remove non-printable characters from a string.

void remove_non_printable_chars( std::wstring& wstr )
{
    // get the ctype facet for wchar_t (Unicode code points in pactice)
    typedef std::ctype< wchar_t > ctype ;
    const ctype& ct = std::use_facet<ctype>( std::locale() ) ;

    // remove non printable Unicode characters
    wstr.erase( std::remove_if( wstr.begin(), wstr.end(),
                    [&ct]( wchar_t ch ) { return !ct.is( ctype::print, ch ) ; } ),
                wstr.end() ) ;
}


Incidentally, U+0020 is a printable Unicode character.

vijayan121
Posting Virtuoso
1,606 posts since Dec 2006
Reputation Points: 1,159
Solved Threads: 287
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: