Hey guys, i been trying to tokenize a string using the boost library.

int main(){
   using namespace std;
   using namespace boost;
   string s = "This is,  a test";
   tokenizer<> tok(s);
   for(tokenizer<>::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";

The problem i am facing currently is that i am unable to tokenize a string, yet allow it to leave the whitespaces as token also. I tried replacing the whitespaces with a character , but how am i able to tokenize the string. For example , i subtituted all the white spaces with "|" , thus the string above will be "This|is,||a test".

From here , how can i go about making sure that the string will continue being tokenized only leaving the words , but "|" will not be considered as a char to be removed. Hope for some help here.

5 Years
Discussion Span
Last Post by vijayan121

I'm not famiiar with boost, I don't use it. Based on what I'm seeing though, you may want to search the documentation for the "boost::tokenizer" type. You probably need to provide an argument to a function somewhere that tells it to use whitespace characters as the tokenization character(s).

It could also be a problem with the way you are resolving the namespace(s). It's unlikely, since you can compile, but there may be some sort of collision because of how you've resolved them. Try using specific using statements (such as "using std::cout") instead of the broad-spectrum versions (i.e. "using namespace std"). Even if it's not part of the problem it is a better way to do it as it helps prevent future problems.

Edited by Fbody: n/a


> tokenize a string, yet allow it to leave the whitespaces as token also.

Construct a boost::char_separator<char> with white spaces as the kept delimiters (an an empty string for dropped delimiters). Tokenize with a tokenizer using this as the tokenizer function. For example:

#include <iostream>
#include <boost/tokenizer.hpp>

int main()
   std::string str = "This is,  a test";

   typedef boost::tokenizer< boost::char_separator<char> > tokenizer_type ;

   const char* const dropped_delimiters = "" ; // nothing
   const char* const kept_delimiters = " \t" ; // space, tab
   boost::char_separator<char> separator( dropped_delimiters, kept_delimiters ) ;

   tokenizer_type toker( str, separator ) ;

   int cnt = 0 ;
   for( tokenizer_type::iterator beg = toker.begin() ; beg != toker.end(); ++beg )
       std::cout << "token " << ++cnt << ": '" << *beg << "'\n";
This article has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.