Greetings,

String parsing isn't always an easy task. Especially in cases where you need to split a single string into a great multitude, but also accounting for maximum performance.

The following code presented does this task simply. Using precise allocation techniques perform greatly when writing an algorithm to precision.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int getLineCount(char *buffer);
int parseStringbyLines(char *buffer, char ***string);

int main() {
	int i, j, lines;
	char string[25];
	char **parsed = NULL;

	strcpy(string, "Hello\nMy friends\nLets parse this string!");

	// Parse string
	lines = parseStringbyLines(string, &parsed);
	if (!lines) {
		printf("Parsing failed.\n");
		return 0;
	}

	// Print parsed string
	printf("%d lines parsed.\n\n", lines);
	for (i = 0; i < lines; i++)
		printf("%s\n", parsed[i]);

	// Free memory
	for (j = 0; j < lines; j++)
		free(parsed[j]);
	free(parsed);

	return 0;
}

int getLineCount(char *buffer) {
	int z = 1;
	char *pch;

	// Find first match
	pch = strchr(buffer, '\n');

	// Increment line count
	while (pch) {
		pch = strchr(pch+1, '\n');
		z++;
	}

	return z;
}

int parseStringbyLines(char *buffer, char ***string) {
	int		*newLine;
	int		b, j, l, z = 1;
	int		lineCount, len;
	char 		*pch, **temp = NULL;

	/*
	** Get line count
	** Allocate memory for new line handling
	** Check if memory allocating failed
	*/
	lineCount = getLineCount(buffer);
	newLine = (int *)malloc(lineCount + sizeof(int) * sizeof(*newLine));
	if (!newLine)
		return 0;
	newLine[0] = 0;

	// Find first occurance of a new line
	pch = strchr(buffer, '\n');
	if (!pch)
		return 0;

	// If found, find all positions
	while (pch) {
		newLine[z] = pch-buffer+1;
		pch = strchr(pch+1, '\n');
		z++;
	}
	newLine[z] = (int)strlen(buffer) + 1;

	// Allocate memory to our temporary pointer
	temp = (char **)malloc(lineCount * (sizeof *temp));
	if (!temp)
		return 0;

	// Go through all lines found
	for (l = 0; l < z; l++) {
		b = 0;
		len = ((newLine[l+1]-1) + (newLine[l]) + 1);

		// Allocate memory per index
		temp[l] = (char *)malloc(len * sizeof(**temp));
		if (!temp[l])
			return 0;

		// Put our data in
		for (j = newLine[l]; j < newLine[l+1]-1; j++) {
			temp[l][b] = buffer[j];
			b++;
		}
		temp[l][b] = '\0';
	}

	// Free memory for line position
	free(newLine);

	// Set our pointer to point to char **temp
	*string = temp;

	// Return lines found
	return z;
}

Hello all,

Size calculation in memory allocation at line 63 would make sense if it was:

newLine = (int *)malloc(lineCount * sizeof(int) + sizeof(*newLine));

Agree?

Way too complicated and with unnecessary code. Here is a greatly simplified way to do it. Using strtok() you could parse a string that has more than one kind of deliminator, such as '\n' and '\t' and ',' and the space, these are typical of CVS files (exported XLS files).

int parseStringbyLines(char *buffer, char ***string) 
{
    char** lines = NULL;
    char* ptr;
    char *temp;
    int size = 0;
    if( buffer == NULL || *buffer == '\0' || string == NULL || *string != NULL)
        return 0;
    temp = strdup(buffer); // duplicate the string
    ptr = strtok(temp, "\n");
    while(ptr)
    {
        lines = realloc(lines, (size += 1) * sizeof(char*));
        lines[size-1] = strdup(ptr);
        ptr = strtok(NULL, "\n");
    }
    free(temp);
    *string = lines;
    return size;
}

Edited 6 Years Ago by Ancient Dragon: n/a

>>Your realloc idiom is one to avoid.
Just standard C. When the first parameter is NULL realloc() acts like malloc().

>>strdup is nonstandard
You might be right. That function can easily be re-implemented for any compiler.

lines = realloc(lines, (size += 1) * sizeof(char*));

On realloc failure a new pointer is made pointing to NULL and the original block is lost and not freed.

Comments
good to know

>>On realloc failure a new pointer is made pointing to NULL and the original block is lost and not freed.

If realloc() fails, then the program has much larger problems than a little memory leak, such as the entire program will, or is about, to crash due to lack of memory. And that could even extend to the entire operating system. Consequently, on modern computers with several gig ram I don't even worry about realloc() failing. If you are working with embedded systems that have very limited ram then that would be different. But not very many people are doing that.

>>If realloc() fails, then the program has much larger problems than a little memory leak, such as the entire program will, or is about, to crash due to lack of memory. And that could even extend to the entire operating system. Consequently, on modern computers with several gig ram I don't even worry about realloc() failing. If you are working with embedded systems that have very limited ram then that would be different. But not very many people are doing that.

Regardless of your excuses; it is bad programming.

Edited 6 Years Ago by Aia: n/a

>>Regardless of your excuses; it is bad programming.
Its not an excuse -- its a fact of reality. You can pretty-up the code all you want but when malloc() and realloc() fail the program's gonna crash, and good programming ain't going to help that one bit.

I'd hate to be maintaining code you've written.

You're like the Herb Schildt of Daniweb. :(

Edited 6 Years Ago by Dave Sinkula: n/a

>>Its not an excuse -- its a fact of reality. You can pretty-up the code all you want but when malloc() and realloc() fail the program's gonna crash, and good programming ain't going to help that one bit.

Memory management in C is the utmost importance. Disregarding checking for proper returns of malloc() and realloc() is bad programming. Ignoring possible memory leaks is negligence. The failures of these standard functions are not always related to lack of memory, but rather failure of giving you the memory you want to use. If program crashes, it is because you have created a bad program.

Edited 6 Years Ago by Aia: n/a