Hi all,
This is a script that reads in text from a file and separates it into characters. It works fine as long as MAXSTRLEN is under 610000.
However, I need it to read in whole books of text. Whenever I try to bump that number up above 610000 or so, I get a segmentation fault and I'm not sure why. Gdb is giving me this as an error:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400727 in main () at charfreq.c:19
19 for (i = 0;((ch=getchar()) != EOF) && (i < MAXSTRLEN); i++)
Any advice would be appreciated.
Thanks,
bucwet

#include <stdio.h>
#include <ctype.h>
#include <string.h>

//#define MAXSTRLEN 610000
#define CHARS 128


int main(void)	{

	long int const MAXSTRLEN = 1000000;
	long int length,position,numArr[MAXSTRLEN],asciiArr[MAXSTRLEN],count,i;
	int j;
	char *str[MAXSTRLEN];
	char ch;

	count = 0;
	// get source text
	for (i = 0;((ch=getchar()) != EOF) && (i < MAXSTRLEN); i++)
	{
		str[i] = (int)ch;
		count++;
	}

	length = strlen(str);
	// print entered string
	printf("Entered string is:\n");
	for (i = 0; i < length; i++)
	{
		printf("%c",str[i]);
	}


	// convert string to ascii integer code
	for (i = 0; i < length; i++)
		{
			numArr[i] = (long int)str[i];
		}

	printf("\n");
	// assign asciiArr[] to zero
	for (i = 0; i < CHARS; i++)
	{
		asciiArr[i] = 0;
	}

	// and go through the numArr, increment asciiArr[]
	for (i = 0; i < length; i++)
	{
		position = numArr[i];
		asciiArr[position]++;
	}

	// print out formatted text
	for(j = 0; j < CHARS;j++)
	{
		if	((asciiArr[j] != 0) && (j != 10) && (j != 13))
		{
			printf("%4x: %4d: %3c %6ld  %8f\n",j,j,j,asciiArr[j],(((float)asciiArr[j])/length));
		}
		else if (j == 10)
		{
			printf("%4x: %4d:  \\n %6ld  %8f\n",j,j,asciiArr[j],(((float)asciiArr[j])/length));
		}
		else if (j == 13)
		{
			printf("%4x: %4d:  \\r %6ld  %8f\n",j,j,asciiArr[j],(((float)asciiArr[j])/length));
		}
	}
	printf("Amount of characters: %ld.\n",length);
	printf("\n");

	return(0);

}

line 14: That declares an array of MAXSTRLEN number of pointers. Its not a character array. Remove the *.

line 21: There is no need for the typecast.

line 22: There is no need to increment both variables i and count because they both have the same value.

line 24: make sure to null terminate the string.

lines 28-31: Why print the string one character at a time? Waste of good cpu time and bandwidth. Just print it all in one shot printf("%s\n", str); lines 34-38. That does nothing. No conversion is needed to display either character value or decimal value.

line 48-52: That could be done by using array str directly

for (i = 0; str[i] != '\0'; i++)
   asciiArr[str[i]]++;

> numArr[MAXSTRLEN],asciiArr[MAXSTRLEN]
long ints are typically 4 bytes, so this little lot takes a whopping 8MB of stack space.

You're almost certainly blowing your stack limits with this code.

> for (i = 0; i < CHARS; i++)
Especially as this much smaller limit seems to be applicable for at least one of the arrays!

Wow, so I took both of your advice and got rid of one of the big arrays and changed

long int const MAXSTRLEN = 1000000;
	long int length,position,asciiArr[MAXSTRLEN],i; //taking out numArr
	int j;
	char str[MAXSTRLEN];
	char ch;


	// get source text
	for (i = 0;((ch=getchar()) != EOF) && (i < MAXSTRLEN); i++)
	{
		str[i] = ch;
	}
	str[i] = '\0';
	length = strlen(str);

and

// and go through the numArr, increment asciiArr[]
	for (i = 0; i < length; i++)
	{
		position = str[i];  // changing numArr to str
		asciiArr[position]++;
	}

and got rid of the code at 35-39.
No more Seg Fault! Now it appears to work well. This had me tied up for hours. Thanks a bunch!
Bucwet

This question has already been answered. Start a new discussion instead.