String Tokenizer

Alex Edwards 0 Tallied Votes 177 Views Share

Simple program that returns a vector of string elements based on the string argument and the token specified.

#include <iostream>
#include <vector>
#include <string>

std::vector<std::string> split(const std::string&, const std::string&);
std::vector<std::string> split(const char*, const char*);
std::vector<std::string> split(const std::string&, const char*);
std::vector<std::string> split(const char*, std::string&);

std::vector<std::string> split(const std::string& myStr, const std::string& token){
    std::vector<std::string> temp (0);
    std::string s;

    for(std::size_t i = 0; i < myStr.size(); i++){
        if( (myStr.substr(i, token.size()).compare(token) == 0)){
            temp.push_back(s);
            s.clear();
            i += token.size() - 1;
        }else{
            s.append(1, myStr[i]);
            if(i == (myStr.size() - 1))
                temp.push_back(s);
        }
    }
    return temp;
}

std::vector<std::string> split (const char* lhs, const char* rhs){
    const std::string m1 (lhs), m2 (rhs);
    return split(m1, m2);
}

std::vector<std::string> split (const char* lhs, const std::string& rhs){
    return split(lhs, rhs.c_str());
}

std::vector<std::string> split (const std::string& lhs, const char* rhs){
    return split(lhs.c_str(), rhs);
}

template<class Element>
std::ostream& displayElements(const std::vector<Element>& arg, std::ostream& output = std::cout){
    for(std::size_t i = 0; i < arg.size(); i++)
        output << arg[i] << "\n";

    return output;
}

int main(){

    std::string first ("Oh I see how it is! >_<"), token = " ";
    std::vector<std::string> myVec = split(first, token);

    displayElements<std::string>(myVec);

    return 0;
}
Alex Edwards 321 Posting Shark

Hmm just noticed a few potential problems...

The string in the first split function should, most likely, be initialized to an empty string.

Secondly the split method should throw an Exception if the user specifies a 0 length token such as "".

Alex Edwards 321 Posting Shark

I was being overly cautious and explicit with the possible parameters for what a user enters, but since string constructors aren't marked explicit, it is possible that a const string object will be constructed if a const char* value is passed.

In addition, the major flaw to this 'sub-string' tokenizer is that it can tokenize 0-length values if a delmitter is found before any 'words' or valid tokens are found.

I think i will make a strictTokenize method or possibly a templated tokenizer that will, when passed a true value, be a strict tokenizer or when passed a false value be a relaxed tokenizer (much like this version @_@ )

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.