I am reading a line of text from a file and need to split it into tokens.

This is a test and test number is: test(001)

I need the tokens to be

This
is
a
test
and
test
number
is
:
test
(
001
)

How do i split a string into tokens

here is my code so far

#include <fstream>
#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <stdio.h>

using namespace std;

vector <string> SplitString (string line)	
{
      //this is where the split needs to occur
}

bool ispunct (char aCharacter, string delimiters)
{
	int numDelimiters = delimiters.length ();
	for (int i = 0; i < numDelimiters; i++)
	{
		if (aCharacter == delimiters[i])
		return true;
	}

	return false;
}

int main()
{

	int i;
	int a=0;
	int c;
	char ch;
	string line;
	vector <string> tokens;
	
	ifstream myFile("scan.cm"); 
	
	if (! myFile)
	{
		cout << "Error opening output fle"  << endl;
		return -1;
	}
	
	while( getline( myFile, line ) )
	{
		a++;
			
		vector <string> newTokens = SplitString (line);
		
		int numNewTokens = newTokens.size();
		
		for (int i = 0; i < numNewTokens; i++)
		{
			tokens.push_back (newTokens[i]);
		}
		
		cout << "line " << a << ": " << endl;
		
	}
		
	myFile.close();

	return 0;
}

Recommended Answers

All 2 Replies

ispunct() may not be specifically what you are looking for to split up the string into tokens.. but I would suggest strtok()

char * strtok ( char * str, const char * delimiters );

as you can see, strtok() accepts 2 arguments, the first being a c-string (char array) that you want to be tokenized; the second argument is another c-string that you populated with delimeters (any character that will signify the end of your token, such as a ' ' white space or a '.' period.. could be anything you want) The function will return a pointer to the first character of the token whenever it hits one of the delimeters. (which we will save into an array of char* pointers in the example below)

So put strtok() in a loop and let it fly.. it will return *char pointers to every token in the string that you supply whenever it detects of your delimeting characters.

I see that in your code you are using <string> class variables.. which is fine, but remember, strtok() is looking for a c-string char array.. not a <string> class object. Luckily, string objects contain a member function that will return a c-string pointer:

#include<cstring>

string input = "This is a sample string.";
char delimeters[3] = {'/', '\n', ' '};

//Dynamic array (of 'char' pointers that will contain the address of each token)
char **tokens = new char*[80];

int i=0;
while(i < input.size())
{
     //Let's turn this <string> into a c-string so strtok() will be happy teehee
     tokens[i] = strtok(input[b].c_str()[/b], delimeters);

     i++;
}

i=0;
while(tokens[i] != NULL)
{
     //Dereferencing a 'point-to-a-pointer' 
     cout << "\nWord number " << i << " is " << **tokens[i];
     i++;
}

And there ye' be... using strtok() to split up <string> class objects. Ideally, strtok() works best with c-strings because string objects already contain member functions that allow for easy parsing (find(), find_first_of(), and substr() for example.) strtok() of course, is a member of the <cstring> library for a reason.

Enjoy hours of fun strtok()'ing.

Minor error, here is the updated code:

while(tokens[i] != NULL)
{
     //Let's turn this <string> into a c-string so strtok() will be happy teehee
     tokens[i] = strtok(input.c_str(), delimeters);

     i++;
}

excerpt about strkok(): "This end of the token is automatically replaced by a null-character by the function, and the beginning of the token is returned by the function."

If I forced the loop with .size() it would have made strtok() work more times than it had to and would have ran off the end of the c-string char array.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.