Ive been given a task where i read in a line of text from a file, and then need to seperate that line into seperate strings, ive done this however I also need to keep some attatched...

the file contains "abc def ghi jkl mno", this is read in using getline(), then using strtok to split them up, which gives me, each line is a different string

abc
def
ghi
jkl
mno

how would i get the 3rd token to stay attatched? like

abc
def
ghi jkl
mno

Recommended Answers

All 6 Replies

strtok isn't that smart. It takes the delimiters you give and breaks up a string based on those delimiters, nothing more. If you want more advanced parsing, you're on your own. If you'd like advice, I'll be happy to help, but you need to be more specific about your needs. It's trivial to handle the exact case you've requested, but if you want the program to recognize which tokens to keep attached, the logic might get tricky.

commented: Couldn't agree more! =) +4

I've tried putting something together that doesnt use strtok, and it looks a bit messy but for the most part it works in sepereating the words where ever a space is found but Im having some trouble behind the logic of keeping a section still attatched.

here is my code,

ifstream ins;
	ins.open("test.dat", ios::in);

	char info[500];
	char *first, *second, *third, *fourth;
	
	first = new char;
	second = new char;
	third = new char;
	fourth = new char;

	int i = 0;

	if(ins.good())
	{
		int counter = 0;
                   // reads line from .dat file to extract info from
		ins.getline(info, 500);
                   // copies all the characters to *first till space is found
		for(int j = 0; j < strlen(info)+1; j++)
		{
			if(info[j] != ' ')
			{
				first[j] = info[j];
			}else{
				counter = j+1;
				break;
			}
		}
                 // reading the second word after the space
		int x = 0;
		for(int j = counter; j < strlen(info)+1; j++)
		{
			if(info[j] != ' ')
			{
				second[x] = info[j];

			}else{
				counter = j+1;
				break;
			}
			x++;
		}
                 // reading third word and attempting to keep fourth word also attatched
		int y = 0;
		for(int j = counter; j < strlen(info)+1; j++)
		{
			
			if(info[j] != ' ')// if not equal to a space
			{
				third[y] = info[j];

			}else{ 
				third[y] = info[j]; // include the space and keep reading
				if(info[j] != ' ') 
				{
					third[y] = info[j];
				}else{	
					counter = j+1;
					break;
				}
			}

			y++;
		}

		int g = 0;
		for(int j = counter; j < strlen(info)+1; j++)
		{
			
			if(info[j] != ' ')
			{
				fourth[g] = info[j];

			}else{
				counter = j+1;
				break;
			}
			g++;
		}
	}

	cout << "First " << first << endl;
	cout << "Second " << second << endl;
	cout << "Thrid " << third << endl;
	cout << "Fourth " << fourth << endl;

test.dat contains info like

comedy movie anger management 02:14:11

the idea of my code is to put all these things into a char string, even the time, like, "comdey" would be first, second would be "movie", third is "anger management" and fourth is the time. the third will always have 2 word titles, never 1 or more.

>I've tried putting something together that doesnt use strtok
You don't have to stop using it altogether, just understand the weaknesses of strtok. For example, the first two tokens can be easily acquired using strtok:

char *genre = std::strtok ( info, " " );
char *category = std::strtok ( 0, " " );

It's the third token that creates a problem because your string format doesn't have any special rules for embedded delimiters. That's where strtok breaks down and you need to use something else. But it's possible to start where strtok stopped and do the split manually:

char *name = category + std::strlen ( category ) + 1;
char *length = std::strrchr ( name, ' ' ) + 1;

// Split name and length
length[-1] = '\0';

Note that this doesn't do any error checking. Keep that in mind and add it if you use a similar strategy in your program.

The better solution is to change your string format so that you're not using spaces for both field delimiters and field values. I like pipes because they're not a common value character:

comedy|movie|anger management|02:14:11

Now strtok will work for all of the tokens.

I can't see any reason why you don't just split all the fields, assign:

field[] = split(line, " ")
genre = field[0]
category = field[1]

title = field[2]
n = field.count()-1
for i = 3 to n-1
   title = title + " " + field[i]
next i

time = field[n]

This allows titles having diffferent numbers of words. You can of course use strtok for the above.

cool thanx, that helped alot :), i was wondering if it would be posible to find the location at where strtok left like an array index?

because the second snippet of code u gave me

char *name = category + std::strlen ( category ) + 1;
char *length = std::strrchr ( name, ' ' ) + 1;

 // Split name and length

length[-1] = '\0';

works in finding the next space used but then continues till the end of my line, in my text file there are multiple spaced between each field... like (not tabs)

"        comdey           movie            anger management           12:12:12"

if i could get the index of where strtok would have stopped I could just use a while(not == ' ') loop, and increment the index till its false, also would i be able to use strtok again after ive done this loop to include to remaining info in the line and ignore ":"?

im new to c++ so sorry for the noob questions

>i was wondering if it would be posible to find the
>location at where strtok left like an array index?
Possible, yes. Recommended, hell no. The closest you can get is adding the length of the most recent token to the pointer of the most recent token and subtracting that from a pointer to the beginning of the string. For example:

size_t end = ( category + std::strlen ( category ) + 1 ) - info;

>in my text file there are multiple spaced between each field
Right, you're getting well past the point where strtok is a suitable solution, but you can fix that problem using only the pointers:

char info[] = "        comdey           movie            anger management           12:12:12";

char *genre = std::strtok ( info, " " );
char *category = std::strtok ( 0, " " );
char *name = category + std::strlen ( category ) + 1;

// Skip leading whitespace
while ( std::isspace ( *name ) )
  ++name;

char *length = std::strrchr ( name, ' ' ) + 1;
char *end = length - 1;

// Find the end of the name
while ( std::isspace ( *end ) )
  --end;

// Remove trailing whitespace
end[1] = '\0';
commented: great help, thank you :) +1
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.