Removing white space characters from strings

Updated ravenous 1 Tallied Votes 2K Views Share

There are occasionally posts asking how to remove white space from string of characters (either a C-style char array, or a C++ std::string). Using functions in standard C++ libraries, this is quite an easy thing to do:

#include <string>
#include <algorithm>
#include <iostream>
#include <cctype>

int main()
{
    /* Make a string to test things out on */
    std::string myString("here Is    A   String With White  n  Space");
    std::cout << "Before:t" << myString << std::endl;

    /* Use the stl algorithm remove_if to remove the desired characters */
    std::string::iterator newEnd = std::remove_if( myString.begin(), myString.end(), isspace );
    std::cout << "After:t" << std::string(myString.begin(), newEnd) << std::endl;

    return 0;
}

The output should look something like this:

Before: here Is    A   String With White  
  Space
After:  hereIsAStringWithWhiteSpace

The keys to this snippet are the STL algorithm std::remove_if, which is declared in the <algorithm> header and the function isspace(), which is declared in the cctype header.

std::remove_if just moves along the string (in this case it's a string, but like all the STL algorithms you can use any container that supports the correct iterator types. In this case a forward iterator) and removes the elements that return true when passed to the function that you provide (isspace() in this case). The other feature of std::remove_if is that it doesn't actually remove the elements, it just over-writes the existing elements, but misses out the ones that return true in the function. So, if you just did something like this:

std::string myString("here Is    A   String With White  n  Space");
std::cout << "Before:t" << myString << std::endl;
std::remove_if( myString.begin(), myString.end(), isspace );
std::cout << "After:t" << myString << std::endl;

You'd end up with output that looks like:

Before: here Is    A   String With White  
  Space
After:  hereIsAStringWithWhiteSpaceWhite  
  Space

At first it looks like it hasn't done anything at all! However, the start of the string contains the new string with the spaces removed, but std::remove_if has stopped over-writing the elements when it got to the last element of the original string( the "e" of "Space", in this case). This has speed advantages for the algorithm, but isn't great for our desired functionality. The good news is that std::remove_if returns an iterator to the new end of the string, which we use to construct a new string on line 14.

Since the STL algorithms are generic, you can use C-style pointers in them as well as C++ iterators. This is great for us, since this approach works just as well for a char array as for a std::string:

char s[] = "here Is    A   String With White  n  Space";
int size = 43;

char* newEnd = std::remove_if( s, s + size, isspace );
std::cout << "After:t" << std::string(s, newEnd - 1) << std::endl;

In this case, I'm using a std::string to output the result, but the algorithm is working on the char array.

Finally, what if you just want to get rid of all non-alphanumeric characters? You can define a new function that calls isalnum:

int isUnacceptable( char ch ){   return !isalnum( ch );   }
/* ... */
std::string::iterator newEnd = std::remove_if( myString.begin(), myString.end(), isUnacceptable );

This removes everything that isn't a number or letter

#include <string>
    #include <algorithm>
    #include <iostream>
    #include <cctype>
    
    int main()
    {
        /* Make a string to test things out on */
        std::string myString("here Is    A   String With White  n  Space");
        std::cout << "Before:t" << myString << std::endl;
    
        /* Use the stl algorithm remove_if to remove the desired characters */
        std::string::iterator newEnd = std::remove_if( myString.begin(), myString.end(), isspace );
        std::cout << "After:t" << std::string(myString.begin(), newEnd) << std::endl;
    
        return 0;
    }
Member Avatar for iamthwee
iamthwee

Nice work. But generally people want to KEEP single spaces and remove more than double spaces.

http://www.daniweb.com/software-development/cpp/threads/106452/518972#post518972

Also look into writing a trim function.


http://www.daniweb.com/software-development/cpp/threads/155472/729101#post729101

WaltP 2,905 Posting Sage w/ dash of thyme Team Colleague

I prefer something along the lines of

char s[] = "here Is    A   String With White  \n  Space";
int i1, i2;
...
i2=0;
for (i1=0; i1 < strlen(s); i1++)
{
    if (!isspace(s[i1])  
        s[i2++] = s[i1];
}

Another version:

string s = "here Is    A   String With White  \n  Space";
    string t = "";
    unsigned int i1;

    cout << s;

    for (i1=0; i1 < s.length(); i1++)
    {
        if (!isspace(s[i1])) 
            t += s[i1];
    }
    s = t;
    cout << "\n\n" << s;

To me it's more straightforward and easier to read.

lexusdominus 1 Junior Poster in Training

similarly for me, written by my own fair hand..

string removewhitefrom(string full)
{
     char x;
     int counter = 0;
     string processed;
     for (int t = full.size();t > 0; t--)
     {
     x = full.at(counter);
     if (x != ' ')
         {
         processed.push_back(full.at(counter));
         }
     counter++;
     }
   return processed;
}
mrnutty 761 Senior Poster

From the presented solution, IMO OP solution is better. Reasons being( in no order ),

1) Less to Type
2) Less chance to get wrong
3) Follows Idiom better( reuse )
4) Possibly faster
5) Cleaner
6) Safer
7) Is more C++ than C

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster

I agree, the OP's solution is better than the others posted. The remove_if is exactly meant for this purpose and it's use is clear, efficient and idiomatic.

But the string copying is useless btw, the idiomatic use of remove_if is as follows:

myString.erase( std::remove_if( myString.begin(), 
                                myString.end(),
                                isspace ),
                myString.end());
Narue 5,707 Bad Cop Team Colleague

Nice work. But generally people want to KEEP single spaces and remove more than double spaces.

http://www.daniweb.com/software-development/cpp/threads/106452/518972#post518972

Also look into writing a trim function.


http://www.daniweb.com/software-development/cpp/threads/155472/729101#post729101

Maybe we can steal Apple's catch phrase and modify it for C++: "There's an algorithm for that".

#include <algorithm>
#include <cctype>
#include <iostream>
#include <string>

using namespace std;

struct whitespace {
    bool operator()(char a, char b)
    {
        return isspace(a) && isspace(b);
    }
};

int main()
{
    string s("here Is    A   String With White  \n  Space");

    cout << "Before: '" << s << "'\n";
    s.erase(unique(s.begin(), s.end(), whitespace()), s.end());
    cout << "After:  '" << s << "'\n";
}
mrnutty commented: Oh why are thou so witty +13
ravenous 266 Posting Pro in Training

Nice posts everyone. OK, to sum up all your white space manipulating needs (using the test string std::string s(" \t\nhere\n\n\n is a string \t\t with random white space\t"); ):

  1. Remove all white space:
    s.erase( std::remove_if( s.begin(), s.end(), isspace ), s.end() );
  2. Remove all multiple spaces:
    bool consecutiveWhiteSpace( char a, char b ){   return isspace(a) && isspace(b);    }
    /* ... */
    s.erase( std::unique(s.begin(), s.end(), consecutiveWhiteSpace), s.end() );
    std::replace_if( s.begin(), s.end(), isspace, ' ' );

    In this case, I have extended Narue's method with a second step that replaces the remaining single white space characters with spaces.

  3. Trim spaces from either end of a string:
    int notWhiteSpace( char ch ){   return !isspace( ch );  }
    /* ... */
    std::string::iterator start = std::find_if( s.begin(), s.end(), notWhiteSpace);
    std::string::iterator end = std::find_if( s.rbegin(), s.rend(), notWhiteSpace ).base();
    std::string s2( start, end );

    There might be a better way of doing this last one though :o)

Narue 5,707 Bad Cop Team Colleague

I only have one nitpick:

s.erase( std::remove_if( s.begin(), s.end(), isspace ), s.end() );

These calls using isspace, or anything in <cctype>, aren't portable for various reasons such as being overloaded. The strictly correct version uses a helper function or function object to differentiate between the overloads:

struct whitespace {
    bool operator()(char ch)
    {
        return isspace((unsigned char)ch);
    }
};

s.erase(std::remove_if(s.begin(), s.end(), whitespace()), s.end());

Also note that isspace() is being called safely as well (with a cast to unsigned char). I neglected to do that in my previous code. ;)

mrnutty 761 Senior Poster

>>aren't portable for various reasons such as being overloaded

what are the other reasons?

Narue 5,707 Bad Cop Team Colleague

what are the other reasons?

Passing a pointer to a function with C linkage, templates being involved, and char potentially being signed. The latter wouldn't cause compilation errors while the former two could.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.