I am writing a spellchecker program and have some issues. First thing is that I need to make better use of the memory. My code iterates through a large dictionary file repeatedly. I know its better to read this into memory first then iterate through the block but I really don't know how. The search should be optimized. The other problem is ignoring the non alphanumerical characters in the problem. Would really appreciate some help with this.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void)
{
/*Open files and test that they open*/
FILE *fp1;
FILE *fp2;
char fname[20];
char wordcheck[45];/*The longest word in the English Language is 45 letters long*/
char worddict[45];
char dummy;
int i;
int notfound;

fp1 = fopen("dictionary.txt","r");

if (fp1 == NULL)
{
printf("The dictionary file did not open.");
exit(0);
}

printf("Please enter the path of the file you wish to check:\n");
scanf("%s", fname);
scanf("%c", &dummy);

fp2 = fopen(fname, "r");
	if (fp2 == NULL)
		{
		printf("Your file did not open, please check your filepath and try again.\n");
	
		printf("Please enter path of file you wish to check: \n");
		scanf("%20s",fname);
	
		fp2 = fopen(fname, "r");
		}
	
	else
		{
		printf("Your file opened correctly\n");
		}

/*When files are open, read each word from the text file into an array:*/
	
	while(fscanf(fp2,"%s", wordcheck)!=EOF)//Reads word from text file into array//
	{

		for (i=0; wordcheck[i]; i++)
		{
			wordcheck[i] = tolower(wordcheck[i]);//makes all characters lower case//
		}
			fseek(fp1,0,SEEK_SET);

		/*printf("%s", wordcheck);//debugger*/
		
			while(fscanf(fp1,"%s", worddict)!=EOF)
			{	
				notfound = 1;
				
				if(strcmp(wordcheck, worddict)==0)//compare strings//
				{
				printf("This word: %s is in the dictionary\n", wordcheck);//debugger//
				notfound = 0;
				break;
				}
			}
			if(notfound == 1)
				{
				printf("%s is not in dictionary\n", wordcheck);
				}
	}
	printf("Your file has been checked. Please check any errors found");
	fclose(fp1);
	fclose(fp2);
			
return 0;
}

Depending on how much memory you have, you may or may not have enough memory to store all the strings in the dictionary in RAM. If you can end up with an alphabetically sorted array of type char* in RAM, that's ideal, at which point you can do quick binary searches for each word. That should be faster than searching a file, though you never know considering how the computer tends to cache disk reads. Anyway, go for a sorted array of char* for your dictionary.

If that's not possible, you can still improve your performance by creating a sorted FILE of your dictionary and then doing a binary search of the file for each word. Right now you're searching for "zebra", and comparing "zebra" to "aardvark", "able", "ace", "ack", etc. That's a heck of a lot of unnecessary comparisons. Instead seek to somewhere in the middle of the file and compare "zebra" to "mountain", then based on that result, compare it to "giraffe" or "swallow", etc. A binary search of a sorted dictionary file, but using the "seek" command rather than an array index.


From a programming standpoint, I imagine the second method might be easier and save you a bunch of calls to malloc and a bunch of pointer arithmetic that can go wrong. Regardless, the goal is to get a sorted dictionary and binary search it rather than iterate through every single element.

Edited 4 Years Ago by VernonDozier: n/a

>> I know its better to read this into memory first then iterate through the block but I really don't know how. The search should be optimized.

See above. You shouldn't iterate. You do a binary search. As far as how to read the file into RAM, again that depends on how much RAM you have. If you...

  1. Know how many letters are in the maximum length word(you do -- 45)
  2. Know the maximum number of words in the file
  3. Have enough memory to declare the array at compile time and can afford to reserve 45 characters per word

then your job is much easier.

You just create a huge 2-D array of, say, 100,000 words.

char dictionary [100000][46]; // add 1 to 45 for NULL terminator.  Holds 100,000 words

There's your array. Not a single malloc command in there. If you can do that, go for it. If you can't, try to allocate 4,600,000 bytes with malloc. If you can't do THAT, you'll have to try to allocate enough RAM for 100,000 char*, then do all of the fun memory management. If you can't malloc even that much in one shot, you have to get even more creative or again, not bother trying to read it all into RAM and just use the "seek" method as explained.

At any rate, I don't know your skill level, so I won't elaborate further. The explanation so far may have made you either more or less confused, so I'll wait for a follow-up from you.


At any rate, whatever you do, DON'T iterate through the dictionary and compare every single word.

Thanks for the replies guys. My thoughts were to either read the file into a binary tree(which I am seriously lost with) and do a binary search as was suggested in the first post. This would probably be the best method however that may be beyond my coding skills. The second method might be better...

As for my skill level, I have been doing C programming for 5 weeks so I'm very much a beginner!

Thanks for the response very much appreciated!

Looking at the code, I'm seeing comments that may give a clue about what method to use. However, I don't know who wrote them and I don't know whether you are a student. My guess from reading the comments is that you are a student in a course and these comments were left by the instructor. Or if you're self-teaching from a book, they're from the book author. I'm looking at the comment on line 45 in particular ("When files are open, read each word from the text file into an array"). However I try not to assume.

If I'm right, however, you've already been told what to do, so there's no choice. Line 45 clearly states to use an array. If you're a student, the assignment probably came with a spec sheet that will explain the requirements more fully. Regarding how to read things into an array, given that you have 5 weeks experience, I'm guessing that the word "malloc" might not be familiar to you (I can't see it being taught in the first 5 weeks), so that tells me you need to go this route, as suggested above.

char dictionary [100000][46]; // add 1 to 45 for NULL terminator.  Holds 100,000 words

As far as a "binary tree" goes, I'll make the point that I suggested a binary SEARCH, but not a binary TREE. A binary search on an array is easier than searching through a binary tree. Could you use a binary tree? Sure, but that requires malloc. The array seems easier regardless and again, it appears to be what is expected by the line 45 comment.


So how does one read from a file into an array. You have part of it.

while(fscanf(fp1,"%s", worddict)!=EOF)
{	
    // code here
}

Now we're reading into an array, so we'll be replacing worddict with dictionary.

while(fscanf(fp1,"%s", dictionary[?])!=EOF)
{	
    // code here
}

Now what goes where the question mark is? Well at some point we'll need to know how many words are in this dictionary, right? So create a variable called num_words_in_dict. At the beginning it will be equal to 0 (makes sense, right, we haven't read any words in yet).

int num_words_in_dict = 0;
while(fscanf(fp1,"%s", dictionary[num_words_in_dict])!=EOF)
{	
    // code here
}

Now what goes where "// code here" is? Well, think about it. If num_words_in_dict stays at 0, we're going to keep writing over the same element in the array and at the end we'll still supposedly have a dictionary size of 0. Clearly that's wrong, so something in "// code here" should change that.

Anything else? Well I'm assuming the dictionary is in alphabetical order already. If it isn't, you might want to implement an insertion sort inside of that loop.

You need to have, at the end of that loop, a sorted array of dictionary words. At that point, you'll never need the dictionary file again, so close it. Then you open the file you are checking against and go through a word at a time and check that word against the dictionary array using a binary SEARCH (again, there's no binary TREE).

Edited 4 Years Ago by VernonDozier: n/a

Thanks Vernon,

Those comments were inserted by myself to keep track of what I was doing! I don't have to use an array. However you are right I am a student working on a project. We weren't given any specific instructions only to ensure that the program runs fast and obviously mine doesnt! I'm familiar with malloc however I think I prefer the idea of reading into a large array and then doing a binary search. It's less time consuming for coding and easier to implement.

Going by your suggestions here is my assumption of how the program should go:

Open & read dictionary into large array
close dictionary and open user file.
Read user file one word at a time(as I have done in my original code)
Replace the loop through the dictionary with a binary search loop(?)
Output errors when they occur
Close file
Free Memory
End program

>> Free Memory

If you have the resources to do this...

char dictionary [100000][46]; // add 1 to 45 for NULL terminator.  Holds 100,000 words

then there is no memory to free. If you don't, you'll need to use malloc and the accompanying "free" command (or commands, plural, depending on how you allocate the array. Creating a 2-D array with malloc and getting all of the pointers right can be a challenge. It's more complicated than dealing with a one-dimensional array, so if the above works for you, use it. There will be now "malloc", there will be no "free", there will be no pointer arithmetic to make sure all of the array indexes line up. I would try that first. If it doesn't work, use a smaller dictionary file till it works, get the rest of the program to work, then come back and tackle the setting up of the array with malloc if required. The advantage of approaching it this way is 1) you may not have to use malloc and free at all if you have sufficient memory resources, and 2) even if you do, the program is easier to debug because you can approach it in stages. Get the file reading and binary search aspects right first. If you try to tackle it all at once, there are a whole bunch of things that could have gone wrong and you have to figure out which it is.


So I think your process is pretty much correct. See red comments.

Set up dictionary array(see comments above as to how to do this. Use malloc only if required. Try it without malloc first and see if it works).
Open & read dictionary into large array
If the dictionary file is not already in alphabetical order, sort the array.
close dictionary and open user file.
Read user file one word at a time(as I have done in my original code)
Replace the loop through the dictionary with a binary search loop(?) -- Yes. Use a binary search.
Output errors when they occur
Close file
Free Memory (Necessary only if you used malloc. See above red).
End program

This question has already been answered. Start a new discussion instead.