I have been asked to program a spellchecker in C for an assignment. I am quite new to C and programming in general, so I have decided to start by writing a program that does the following:

  • Reads words into an array from a dictionary text file.
  • Reads words into an array from a sample file that needs to be spellchecked.
  • Compares whether or not each word form the sample file is in the dictionary using a binary search algorithm.

Here is my code so far:

#include <stdio.h>
#include <string.h>

int read_words(char *dict[20]);
int read_text(char *sample[3]);
int comparison(char *dict[20], char *sample[3]);

int main()
{
    char *dict[20];   //pointer to array 'dict'
    char *sample[3];  // pointer to array 'sample'
    
    read_words(dict);
    read_text(sample);
    comparison(dict, sample);
    
}

int read_words(char *dict[20])   //copies each word from the file 'words.txt' into array 'dict'
{
    FILE *words_ptr;    //pointer for words.txt
    int i;    
    char temp_word[20];  
    char *new_word;
    
    words_ptr = fopen( "words.txt", "r" );
    if( words_ptr != NULL )
    {
        printf( "File words.txt opened\n");
        i=0;
        while (fgets( temp_word, 20, words_ptr )) 
        {
              new_word = (char*)calloc(strlen(temp_word), sizeof(char)); //ensuring new_word will be the right size
              strcpy(new_word, temp_word);     //copy contents of temp_word to new_word
              dict[i] = new_word;               //copy contents of new_word to i'th element of dict array
              printf("printing out dict[%d]: %s\n", i, dict[i]); 
              i++;
        }  
        printf("printing out dictionary1: %s\n", dict[1]);
               
        fclose( words_ptr );
        return 0;
    }
    else {printf( "Unable to open file words.txt\n" ); return 1;}
    
}

int read_text(char *sample[3]) //copies each word from the file 'text.txt' into array 'sample'
{
    //this works exactly the same way as the read_words function
    FILE *text_ptr;
    int j;
    char temp_text[20];
    char *new_text;
    
        

    text_ptr = fopen( "text.txt", "r" );
    if( text_ptr != NULL )
    {
        printf( "File text.txt opened\n");
        j=0;
        while (fgets( temp_text, 20, text_ptr )) 
        {
              new_text = (char*)calloc(strlen(temp_text), sizeof(char));
              strcpy(new_text, temp_text);
              sample[j] = new_text;
              printf("printing out sample[%d]: %s\n", j, sample[j]);
              j++;
        }
        printf("printing out sampleee1: %s\n", sample[1]);  //testing that it prints out sample[1] and not whichever the last sample word was. Can be removed from final program.
           
        fclose( text_ptr );
        return 0;
    }
    else {printf( "Unable to open file text.txt\n" ); return 1;}
    
}

int comparison(char *dict[20], char *sample[3])  //comparing one word from each array with the other and checking if they are the same
{
 
   char *min, *max, *mid; //minimum value, maximum value, mid-point value
   min = dict[0];
   max = dict[20];
   mid = min +(max-min)/2;
   
   //performing the binary search
   while((min <= max) && (*mid != sample[0]))
   {
       if (sample[0] < *mid)
	   {
           max = mid -1;     
           mid = min +(max-min)/2;  
       }
       else
       {
           min = mid + 1;     
           mid = min +(max-min)/2;     
       }
  }
  
  if (*mid == sample[0]) 
    { printf("\n %d found!", sample[0]); }
  else  {printf("\n %d not found!", sample[0]); }
  
  return 0;
}

I get a few error messages at lines 89, 91 and 103 saying: "warning: comparison between pointer and integer".

I can see why I am getting these messages but I do not know how to change the code to do what I want it to do.

I can see a problem with this line (line 89):

while((min <= max) && (*mid != sample[0]))

I am not able to compare mid with sample[0] as they are different types. I want mid to be the middle word from the dict[]array so that I can compare the two values.

I have seen similar code working for when doing a binary search on integers, but am not sure if this is possible when doing a search on strings.

I would appreciate any advice, although I have been told it is possible to do this using something line binary search, or binary search trees, so I would be grateful to know something similar to this. I think a hash table might be a bit beyond me at this stage although I am aware that this is another method.

Thanks

First of all, in C you cannot compare strings with comparison operators. They only compare pointers, not the string data. You must use strcmp , as in strcmp(*mid, sample[0]) != 0 .

Next, I am afraid that dict are not what you think they are. They are dict array values, that is character pointers, which point to a more or less random locations. Each is valid, points a particular word, but the arithmetics on them makes no sense, and will cause crashes. What you want is dict array locations, that is &dict. Now the warnings will go away.

Beware that your code may try to access dict[20], which is beyond the array.

Finally, I would highly recommend to restructure your program a little bit.
A. read_words and read_text are in fact the same function; they just fill up different arrays. Get rid of one of them.
B. Do not pass the whole sample array to comparison routine. It only needs one word at a time.
C. comparison is a misnomer. It should be called search .

Edited 6 Years Ago by nezachem: n/a

First of all, in C you cannot compare strings with comparison operators. They only compare pointers, not the string data. You must use strcmp , as in strcmp(*mid, sample[0]) != 0 .

That is very useful, I have only just learned about strcmp and keep forgetting that I can use it.

Next, I am afraid that dict are not what you think they are. They are dict array values, that is character pointers, which point to a more or less random locations. Each is valid, points a particular word, but the arithmetics on them makes no sense, and will cause crashes. What you want is dict array locations, that is &dict. Now the warnings will go away.

Beware that your code may try to access dict[20], which is beyond the array.

I think I tried something like what you have suggested, but came unstuck with the arithmetic - trying to find the midpoint value between the addresses sometimes led to an address that was not the beginning of a string, if you understand what I mean.

I have actually come across a function called bsearch which is in stdlib.h that is a more efficient way of doing the search, and have created a new program that does this:

/* binary search for spellchecker */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main ()
{
    int i;
    char * string;
    char dict[20][20] = {"aardvark","bloomers","chopsticks","dinner","entry","figure","great","hand","igloo","journey","kennel","lemon","money","noon","open","pantry","queen","tomatoes","vinegar","xylophone"};
    char sample[3][20] = {"vinegar","bloomers","jaunty"};
    
    for (i=0; i<3; i++)
    {
        string = (char*) bsearch (sample[i], dict, 20, 20, (int(*)(const void*,const void*)) strcmp);
        if (string!=NULL)
            {printf ("%s is in the dictionary\n",string);}
        else
            {printf ("%s is not in the dictionary\n",sample[i]);}
    }
    return 0;
}

However, I have not yet figured out a way to incorporate this code into my earlier program. I will be looking into this over the next few days.

Finally, I would highly recommend to restructure your program a little bit.
A. read_words and read_text are in fact the same function; they just fill up different arrays. Get rid of one of them.
B. Do not pass the whole sample array to comparison routine. It only needs one word at a time.
C. comparison is a misnomer. It should be called search .

Thank you for all these final suggestions, especially A. I did not realise that I could use the same function for the two different files.

You have given me a lot of things to think about - many thanks!

This article has been dead for over six months. Start a new discussion instead.