954,483 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?

Removing white space characters from strings

By ravenous on May 24th, 2011 12:59 pm

There are occasionally posts asking how to remove white space from string of characters (either a C-style char array, or a C++ std::string ). Using functions in standard C++ libraries, this is quite an easy thing to do:

#include <string>
#include <algorithm>
#include <iostream>
#include <cctype>

int main()
{
    /* Make a string to test things out on */
    std::string myString("here Is    A   String With White  \n  Space");
    std::cout << "Before:\t" << myString << std::endl;

    /* Use the stl algorithm remove_if to remove the desired characters */
    std::string::iterator newEnd = std::remove_if( myString.begin(), myString.end(), isspace );
    std::cout << "After:\t" << std::string(myString.begin(), newEnd) << std::endl;

    return 0;
}

The output should look something like this: Before: here Is A String With White
Space
After: hereIsAStringWithWhiteSpace

The keys to this snippet are the STL algorithm std::remove_if , which is declared in the <algorithm> header and the function isspace() , which is declared in the cctype header. std::remove_if just moves along the string (in this case it's a string, but like all the STL algorithms you can use any container that supports the correct iterator types. In this case aforward iterator) and removes the elements that return true when passed to the function that you provide ( isspace() , in this case). The other feature of std::remove_if is that it doesn't actually remove the elements, it just over-writes the existing elements, but misses out the ones that return true in the function. So, if you just did something like this:

std::string myString("here Is    A   String With White  \n  Space");
std::cout << "Before:\t" << myString << std::endl;
std::remove_if( myString.begin(), myString.end(), isspace );
std::cout << "After:\t" << myString << std::endl;

You'd end up with output that looks like: Before: here Is A String With White
Space
After: hereIsAStringWithWhiteSpaceWhite
Space

At first it looks like it hasn't done anything at all! However, the start of the string contains the new string with the spaces removed, but std::remove_if has stopped over-writing the elements when it got to the last element of the original string( the 'e' of 'Space', in this case). This has speed advantages for the algorithm, but isn't great for our desired functionality. The good news is that std::remove_if returns an iterator to the new end of the string, which we use to construct a new string on line 14.

Since the STL algorithms are generic, you can use C-style pointers in them as well as C++ iterators. This is great for us, since this approach works just as well for a char array as for a[icode]std::string/icode]:

char s[] = "here Is    A   String With White  \n  Space";
int size = 43;

char* newEnd = std::remove_if( s, s + size, isspace );
std::cout << "After:\t" << std::string(s, newEnd - 1) << std::endl;

In this case, I'm using a std::string to output the result, but the algorithm is working on the char array.

Finally, what if you just want to get rid of all non-alphanumeric characters? You can define a new function that calls isalnum :

int isUnacceptable( char ch ){   return !isalnum( ch );   }
/* ... */
 std::string::iterator newEnd = std::remove_if( myString.begin(), myString.end(), isUnacceptable );

This removes everything that isn't a number or letter :o)

Nice work. But generally people want to KEEP single spaces and remove more than double spaces.

http://www.daniweb.com/software-development/cpp/threads/106452/518972#post518972

Also look into writing a trim function.

http://www.daniweb.com/software-development/cpp/threads/155472/729101#post729101

iamthwee
Posting Expert
5,950 posts since Aug 2005
Reputation Points: 1,543
Solved Threads: 439
 

I prefer something along the lines of

char s[] = "here Is    A   String With White  \n  Space";
int i1, i2;
...
i2=0;
for (i1=0; i1 < strlen(s); i1++)
{
    if (!isspace(s[i1])  
        s[i2++] = s[i1];
}


Another version:

string s = "here Is    A   String With White  \n  Space";
    string t = "";
    unsigned int i1;

    cout << s;

    for (i1=0; i1 < s.length(); i1++)
    {
        if (!isspace(s[i1])) 
            t += s[i1];
    }
    s = t;
    cout << "\n\n" << s;


To me it's more straightforward and easier to read.

WaltP
Posting Sage w/ dash of thyme
Moderator
10,505 posts since May 2006
Reputation Points: 3,348
Solved Threads: 944
 

similarly for me, written by my own fair hand..

string removewhitefrom(string full)
{
     char x;
     int counter = 0;
     string processed;
     for (int t = full.size();t > 0; t--)
     {
     x = full.at(counter);
     if (x != ' ')
         {
         processed.push_back(full.at(counter));
         }
     counter++;
     }
   return processed;
}
lexusdominus
Junior Poster in Training
84 posts since Jun 2009
Reputation Points: 12
Solved Threads: 5
 

From the presented solution, IMO OP solution is better. Reasons being( in no order ),

1) Less to Type
2) Less chance to get wrong
3) Follows Idiom better( reuse )
4) Possibly faster
5) Cleaner
6) Safer
7) Is more C++ than C

firstPerson
Senior Poster
3,923 posts since Dec 2008
Reputation Points: 841
Solved Threads: 608
 

I agree, the OP's solution is better than the others posted. The remove_if is exactly meant for this purpose and it's use is clear, efficient and idiomatic.

But the string copying is useless btw, the idiomatic use of remove_if is as follows:

myString.erase( std::remove_if( myString.begin(), 
                                myString.end(),
                                isspace ),
                myString.end());
mike_2000_17
Posting Virtuoso
Moderator
2,134 posts since Jul 2010
Reputation Points: 1,634
Solved Threads: 457
 

Nice work. But generally people want to KEEP single spaces and remove more than double spaces.

http://www.daniweb.com/software-development/cpp/threads/106452/518972#post518972

Also look into writing a trim function.

http://www.daniweb.com/software-development/cpp/threads/155472/729101#post729101


Maybe we can steal Apple's catch phrase and modify it for C++: "There's an algorithm for that".

#include <algorithm>
#include <cctype>
#include <iostream>
#include <string>

using namespace std;

struct whitespace {
    bool operator()(char a, char b)
    {
        return isspace(a) && isspace(b);
    }
};

int main()
{
    string s("here Is    A   String With White  \n  Space");

    cout << "Before: '" << s << "'\n";
    s.erase(unique(s.begin(), s.end(), whitespace()), s.end());
    cout << "After:  '" << s << "'\n";
}
Narue
Bad Cop
Administrator
15,460 posts since Sep 2004
Reputation Points: 6,464
Solved Threads: 1,401
 

Nice posts everyone. OK, to sum up all your white space manipulating needs (using the test string std::string s(" \t\nhere\n\n\n is a string \t\t with random white space\t"); ): Remove all white space:

s.erase( std::remove_if( s.begin(), s.end(), isspace ), s.end() );
Remove all multiple spaces: bool consecutiveWhiteSpace( char a, char b ){ return isspace(a) && isspace(b); } /* ... */ s.erase( std::unique(s.begin(), s.end(), consecutiveWhiteSpace), s.end() ); std::replace_if( s.begin(), s.end(), isspace, ' ' ); In this case, I have extended Narue's method with a second step that replaces the remaining single white space characters with spaces. Trim spaces from either end of a string: int notWhiteSpace( char ch ){ return !isspace( ch ); } /* ... */ std::string::iterator start = std::find_if( s.begin(), s.end(), notWhiteSpace); std::string::iterator end = std::find_if( s.rbegin(), s.rend(), notWhiteSpace ).base(); std::string s2( start, end ); There might be a better way of doing this last one though :o)
ravenous
Posting Pro
516 posts since Jul 2005
Reputation Points: 269
Solved Threads: 92
 

I only have one nitpick:

s.erase( std::remove_if( s.begin(), s.end(), isspace ), s.end() );

These calls using isspace, or anything in , aren't portable for various reasons such as being overloaded. The strictly correct version uses a helper function or function object to differentiate between the overloads:

struct whitespace {
    bool operator()(char ch)
    {
        return isspace((unsigned char)ch);
    }
};

s.erase(std::remove_if(s.begin(), s.end(), whitespace()), s.end());

Also note that isspace() is being called safely as well (with a cast to unsigned char). I neglected to do that in my previous code. ;)

Narue
Bad Cop
Administrator
15,460 posts since Sep 2004
Reputation Points: 6,464
Solved Threads: 1,401
 

>>aren't portable for various reasons such as being overloaded

what are the other reasons?

firstPerson
Senior Poster
3,923 posts since Dec 2008
Reputation Points: 841
Solved Threads: 608
 
what are the other reasons?


Passing a pointer to a function with C linkage, templates being involved, and char potentially being signed. The latter wouldn't cause compilation errors while the former two could.

Narue
Bad Cop
Administrator
15,460 posts since Sep 2004
Reputation Points: 6,464
Solved Threads: 1,401
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You