954,479 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?

Parsing a string: By lines.

0
By Stack Overflow on Sep 20th, 2004 6:07 pm

Greetings,

String parsing isn't always an easy task. Especially in cases where you need to split a single string into a great multitude, but also accounting for maximum performance.

The following code presented does this task simply. Using precise allocation techniques perform greatly when writing an algorithm to precision.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int getLineCount(char *buffer);
int parseStringbyLines(char *buffer, char ***string);

int main() {
	int i, j, lines;
	char string[25];
	char **parsed = NULL;

	strcpy(string, "Hello\nMy friends\nLets parse this string!");

	// Parse string
	lines = parseStringbyLines(string, &parsed);
	if (!lines) {
		printf("Parsing failed.\n");
		return 0;
	}

	// Print parsed string
	printf("%d lines parsed.\n\n", lines);
	for (i = 0; i < lines; i++)
		printf("%s\n", parsed[i]);

	// Free memory
	for (j = 0; j < lines; j++)
		free(parsed[j]);
	free(parsed);

	return 0;
}

int getLineCount(char *buffer) {
	int z = 1;
	char *pch;

	// Find first match
	pch = strchr(buffer, '\n');

	// Increment line count
	while (pch) {
		pch = strchr(pch+1, '\n');
		z++;
	}

	return z;
}

int parseStringbyLines(char *buffer, char ***string) {
	int		*newLine;
	int		b, j, l, z = 1;
	int		lineCount, len;
	char 		*pch, **temp = NULL;

	/*
	** Get line count
	** Allocate memory for new line handling
	** Check if memory allocating failed
	*/
	lineCount = getLineCount(buffer);
	newLine = (int *)malloc(lineCount + sizeof(int) * sizeof(*newLine));
	if (!newLine)
		return 0;
	newLine[0] = 0;

	// Find first occurance of a new line
	pch = strchr(buffer, '\n');
	if (!pch)
		return 0;

	// If found, find all positions
	while (pch) {
		newLine[z] = pch-buffer+1;
		pch = strchr(pch+1, '\n');
		z++;
	}
	newLine[z] = (int)strlen(buffer) + 1;

	// Allocate memory to our temporary pointer
	temp = (char **)malloc(lineCount * (sizeof *temp));
	if (!temp)
		return 0;

	// Go through all lines found
	for (l = 0; l < z; l++) {
		b = 0;
		len = ((newLine[l+1]-1) + (newLine[l]) + 1);

		// Allocate memory per index
		temp[l] = (char *)malloc(len * sizeof(**temp));
		if (!temp[l])
			return 0;

		// Put our data in
		for (j = newLine[l]; j < newLine[l+1]-1; j++) {
			temp[l][b] = buffer[j];
			b++;
		}
		temp[l][b] = '\0';
	}

	// Free memory for line position
	free(newLine);

	// Set our pointer to point to char **temp
	*string = temp;

	// Return lines found
	return z;
}

Hello all,

Size calculation in memory allocation at line 63 would make sense if it was:

newLine = (int *)malloc(lineCount * sizeof(int) + sizeof(*newLine));

Agree?

artun
Newbie Poster
1 post since Nov 2007
Reputation Points: 10
Solved Threads: 0
 

newLine = (int *)malloc(lineCount * sizeof(int) * sizeof(*newLine));

??

babyshambles
Newbie Poster
1 post since Feb 2010
Reputation Points: 10
Solved Threads: 0
 

Way too complicated and with unnecessary code. Here is a greatly simplified way to do it. Using strtok() you could parse a string that has more than one kind of deliminator, such as '\n' and '\t' and ',' and the space, these are typical of CVS files (exported XLS files).

int parseStringbyLines(char *buffer, char ***string) 
{
    char** lines = NULL;
    char* ptr;
    char *temp;
    int size = 0;
    if( buffer == NULL || *buffer == '\0' || string == NULL || *string != NULL)
        return 0;
    temp = strdup(buffer); // duplicate the string
    ptr = strtok(temp, "\n");
    while(ptr)
    {
        lines = realloc(lines, (size += 1) * sizeof(char*));
        lines[size-1] = strdup(ptr);
        ptr = strtok(NULL, "\n");
    }
    free(temp);
    *string = lines;
    return size;
}
Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 

strdup is nonstandard. Your realloc idiom is one to avoid.

Dave Sinkula
long time no c
Team Colleague
5,058 posts since Apr 2004
Reputation Points: 2,780
Solved Threads: 314
 

>>Your realloc idiom is one to avoid.
Just standard C. When the first parameter is NULL realloc() acts like malloc().

>>strdup is nonstandard
You might be right. That function can easily be re-implemented for any compiler.

Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 
>>Your realloc idiom is one to avoid. Just standard C. When the first parameter is NULL realloc() acts like malloc().

*sigh*

Dave Sinkula
long time no c
Team Colleague
5,058 posts since Apr 2004
Reputation Points: 2,780
Solved Threads: 314
 
lines = realloc(lines, (size += 1) * sizeof(char*));

On realloc failure a new pointer is made pointing to NULL and the original block is lost and not freed.

Aia
Nearly a Posting Maven
2,392 posts since Dec 2006
Reputation Points: 2,224
Solved Threads: 218
 

>>On realloc failure a new pointer is made pointing to NULL and the original block is lost and not freed.

If realloc() fails, then the program has much larger problems than a little memory leak, such as the entire program will, or is about, to crash due to lack of memory. And that could even extend to the entire operating system. Consequently, on modern computers with several gig ram I don't even worry about realloc() failing. If you are working with embedded systems that have very limited ram then that would be different. But not very many people are doing that.

Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 

>>If realloc() fails, then the program has much larger problems than a little memory leak, such as the entire program will, or is about, to crash due to lack of memory. And that could even extend to the entire operating system. Consequently, on modern computers with several gig ram I don't even worry about realloc() failing. If you are working with embedded systems that have very limited ram then that would be different. But not very many people are doing that.

Regardless of your excuses; it is bad programming.

Aia
Nearly a Posting Maven
2,392 posts since Dec 2006
Reputation Points: 2,224
Solved Threads: 218
 

>>Regardless of your excuses; it is bad programming.
Its not an excuse -- its a fact of reality. You can pretty-up the code all you want but when malloc() and realloc() fail the program's gonna crash, and good programming ain't going to help that one bit.

Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 

I'd hate to be maintaining code you've written.

You're like the Herb Schildt of Daniweb. :(

Dave Sinkula
long time no c
Team Colleague
5,058 posts since Apr 2004
Reputation Points: 2,780
Solved Threads: 314
 

>>Its not an excuse -- its a fact of reality. You can pretty-up the code all you want but when malloc() and realloc() fail the program's gonna crash, and good programming ain't going to help that one bit.

Memory management in C is the utmost importance. Disregarding checking for proper returns of malloc() and realloc() is bad programming. Ignoring possible memory leaks is negligence. The failures of these standard functions are not always related to lack of memory, but rather failure of giving you the memory you want to use. If program crashes, it is because you have created a bad program.

Aia
Nearly a Posting Maven
2,392 posts since Dec 2006
Reputation Points: 2,224
Solved Threads: 218
 

I don't want to continue this discussion here because its hijacking the thread and not relevant to the topic of this thread.

Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You