C++ String Split

Tom Gunn 1 Tallied Votes 731 Views Share

The forum is buzzing with questions about tokenizing a C++ string. This is a short function that does it and returns a vector of the tokens. As an option, the delimiter string can represent either a multichar delimiter or a collection of single char delimiters:

// multichar delimiter == "^^"

"aa^^bb^c^^d"

becomes

"aa"
"bb^c"
"d"
// single char delimiter list == "^,"

"aa^^b,b^c^^d"

becomes

"aa"
""
"b"
"b"
"c"
""
"d"

A test driver is not included because that confused people with my last snippet.

#include <string>
#include <vector>

namespace Daniweb
{
    using namespace std;

    typedef string::size_type (string::*find_t)(const string& delim, 
                                                string::size_type offset) const;

    /// <summary>
    /// Splits the string s on the given delimiter(s) and
    /// returns a list of tokens without the delimiter(s)
    /// </summary>
    /// <param name=s>The string being split</param>
    /// <param name=match>The delimiter(s) for splitting</param>
    /// <param name=removeEmpty>Removes empty tokens from the list</param>
    /// <param name=fullMatch>
    /// True if the whole match string is a match, false
    /// if any character in the match string is a match
    /// </param>
    /// <returns>A list of tokens</returns>
    vector<string> Split(const string& s,
                         const string& match,
                         bool removeEmpty=false,
                         bool fullMatch=false)
    {
        vector<string> result;                 // return container for tokens
        string::size_type start = 0,           // starting position for searches
                          skip = 1;            // positions to skip after a match
        find_t pfind = &string::find_first_of; // search algorithm for matches

        if (fullMatch)
        {
            // use the whole match string as a key
            // instead of individual characters
            // skip might be 0. see search loop comments
            skip = match.length();
            pfind = &string::find;
        }

        while (start != string::npos)
        {
            // get a complete range [start..end)
            string::size_type end = (s.*pfind)(match, start);

            // null strings always match in string::find, but
            // a skip of 0 causes infinite loops. pretend that
            // no tokens were found and extract the whole string
            if (skip == 0) end = string::npos;

            string token = s.substr(start, end - start);

            if (!(removeEmpty && token.empty()))
            {
                // extract the token and add it to the result list
                result.push_back(token);
            }

            // start the next range
            if ((start = end) != string::npos) start += skip;
        }

        return result;
    }
}
William Hemsworth 1,339 Posting Virtuoso

Good Snippet.

Nick Evan 4,005 Industrious Poster Team Colleague Featured Poster

I agree with William, very nice indeed.

MichoRizo 0 Newbie Poster

Nice, how can this be done to ignore the delimiter(s) if wrapped in double quotes?

for example if white space is the delimiter ....

Item1 Item2 "Item 3" Item4

should be
Item1
Item2
Item 3
Item4

daviddoria 334 Posting Virtuoso Featured Poster

Why would a test driver confuse people?? That seems to be a very critical part of an example like this!

MichoRizo 0 Newbie Poster

Why would a test driver confuse people?? That seems to be a very critical part of an example like this!

you're kidding, right? creating a main function and calling the code above is trival ...

daviddoria 334 Posting Virtuoso Featured Poster

It may be trivial here, but why not provide it? Someone should be able to mindlessly say "ok lets see what this does -> copy+paste -> run it".

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.