I have too make an application in C that uses a dictionary. I have 54000 words in a file, on per line and I need a way to handle it.
- I first wanted to keep it in an array but as it's bigger that INT (32767) I cannot do that.
- The second option is to keep all in a very long string but it seems it give me some errors and throws me out of program. I tried to use a pointer for char like this:

FILE *f=fopen("d.txt","r");
   char *line;
   while(fgets(line,sizeof(line),f)){
     strcat(str,line);//concatenate all strings to this one - does not work, the pointer does not contain that words...
     //  printf("%s\n",line);
   }
   fclose(f);

Is there a way too keep this dictionary in a variable in C? I don't want too search in the file every time cause it's time consuming and not reliable.

Thanks!

Recommended Answers

All 13 Replies

I have too make an application in C that uses a dictionary. I have 54000 words in a file, on per line and I need a way to handle it.
- I first wanted to keep it in an array but as it's bigger that INT (32767) I cannot do that.
- The second option is to keep all in a very long string but it seems it give me some errors and throws me out of program. I tried to use a pointer for char like this:

FILE *f=fopen("d.txt","r");
   char *line;
   while(fgets(line,sizeof(line),f)){
     strcat(str,line);//concatenate all strings to this one - does not work, the pointer does not contain that words...
     //  printf("%s\n",line);
   }
   fclose(f);

Is there a way too keep this dictionary in a variable in C? I don't want too search in the file every time cause it's time consuming and not reliable.

Thanks!

Some questions..
Pointer needs to be pointing to memory. In this case your pointer 'line' might not be pointing to a valid location
Then sizeof(line) wont give you the size to read. It only gives the size of that pointer.
Why you need to strcat again when fgets gives the whole line in one shot [ I am thinking you need single line in single string]

hi,
I managed to read the whole file in one string, try this:

#include <stdio.h>


int main()
{
   FILE *f= fopen("test.txt","r");

   char buffer[0xFFFF];

   
   fgets(buffer,54000,f);

   printf("\n%s",buffer);
    
   fclose(f);

   return 0;
}

>>I managed to read the whole file in one string

No you didn't. fgets() will not read the entire file into a single character array because fgets() stops reading when it encounters the first '\n' -- which is the character that terminates a line. So all your program will do is read the first word in the file.

>>Is there a way too keep this dictionary in a variable in C
Use a linked list of words. AFAIK there is no limit to the number of words that can be store in memory that way, except of course the amount of available RAM and hard drive swap space. Hopefully, you are not foolish enough to use a 16-bit compiler such as Turbo C.

commented: Good catch :) +8

Yeh well spotted. Because the file I was reading it contained extracted data from an html page. And it does not have any new line characters.

But to solve this problem he can have all the strings on one line. And each word can be separated by a delimiter. And then use strtok.

Hmm... keeping the dictionary in an external file seems like the best approach to me. Correct me if I'm wrong.

Regarding your first idea, it should work with a long int . Correct me if I'm wrong again.

>Hmm... keeping the dictionary in an external file seems like the best approach to me. Correct me if I'm wrong.
How do you suggest looking up a word? Search the file each time? Unless it's a properly designed external database, that's going to be very inefficient. The words will likely all fit in memory, so an internal data structure makes more sense here.

>But to solve this problem he can have all the strings on one line.
Once again, looking up a word is tedious and potentially inefficient. You'd have to parse out the words every time, or store them in a separate data structure. But if you're using a separate data structure anyway, just use that as the primary storage medium.

>Use a linked list of words.
I'd use a balanced binary search tree or a chained hash table. Since this is a dictionary, we can expect lookup to be the primary operation. Searching a linked list of 50k+ words may not be the best approach. Of course, you can try amortizing the performance with tricks like moving the most recent match to the front, but I think a data structure better suited to searching is the superior option.

commented: Nice post :) +8

I just tried this for fun in Turbo C, using an array of char pointers:

char *names[SIZE];

then getting the length of each name, and malloc'ing the memory for it, and putting it into the names[i++] position.

Using short names, TC was limited to less than 5k names (4,200 average).

Which means if the OP is trying to do this with TC, no matter what data structure you choose, if it's internal, you won't be successful.

Which bring me back to Creeps suggestion of simply keeping the names on a HD.
Although HD's can keep a lot of data in their cache, a better idea would be to use a virtual drive, in memory. Like a RAM disk file. I used one of these last year to avoid disk trashing while running a big project, and it worked out very well.

It is a shame to keep using a 16 bit compiler for work like this, when you have a 32 or 64 bit OS, with Gigs of RAM available, however.

@Adak: I'm pretty sure you probably know this, but for those who don't: The reason it failed is that Turbo C is a 16-bit compiler and the programs is produces is limited to 640 Meg RAM, mimus the amount needed for the operating system and other drivers. It actually winds up to somewhere between 450-540 meg.

Oh, I know it well.

I put that up to help the OP see that he can't store 54k names in ANY internal data structure, if he's using turbo C. Not gonna happen. ;)

That leaves the viable options external storage, only.

@Adak: I'm pretty sure you probably know this, but for those who don't: The reason it failed is that Turbo C is a 16-bit compiler and the programs is produces is limited to 640 Meg RAM, mimus the amount needed for the operating system and other drivers. It actually winds up to somewhere between 450-540 meg.

AD, You meant kilobytes right?

We both did.

After I fire up the Turbo C IDE, I show just 403K available - yeeeee! ;)

I tried to keep the words in separate data structures for each letter (cause an array for all would be too long). But I dont know why it my compiler (Borland C) says that I have declared too much global data for that structure....

After that I have another data structure that keep all that data structures with letters as I described above.

Is there a better aproach? If it is... I would thankfully use it :D. I just done that in php but I want to do it in C cause I need more speed. I have a backtracking procedure that searches in the dictionary. Thats why is so important the performance.

If your Borland is like my Turbo C from Borland, then you can't do it. I don't care how many data structs you use. You don't get more memory because you use more data structures. ;)

Global memory comes off the stack, generally. That is limited. Try the heap, it's bigger. (malloc uses the heap for it's memory source, not the stack)

The way to go with it, is to use a newer compiler. MS Visual Express, Pelles C, gcc, Code::Blocks with MingGW, are all compiler's that will allow you to enjoy larger memory access.

I love Turbo C, but for large amounts of memory - phffffftttt! :(

This is the program I wrote to test it:

/* Tries to load all 54,000 names from the names54k.txt file, into an
   array of pointers, where the memory for each name, is malloc'd.

  This was done on Turbo C, ver.1.01
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define SIZE 4000
/* SIZE will vary depending on the length of the words
   that are being saved, from the file.
*/

int main() {
  unsigned int i, j, n; 
  int len;
  char *pstr[SIZE];
  char buff[40]="";
  FILE *fp;

  printf("\n\n");
  fp=fopen("names54k.txt", "rt");
  if(fp==NULL) {
    printf("\nError opening names file");
    return 1;
  }
  i=0;
  while(fgets(buff, sizeof(buff), fp)) {
    len=strlen(buff);
    if(buff[len - 1]=='\n')
      buff[len-1]='\0';
    
    if((pstr[i]=malloc(len))==NULL) {
      printf("\nError allocating memory: i==%u", i);
      return 1;
    }
    for(j=0;j<len;j++)
      *(pstr[i]+j)=buff[j];

    buff[0]='\0';
    ++i;
    if(i>SIZE) break;
    printf("\n%s", pstr[j]);
  }
  //for(j=0;j<20;j++)  printf("\n%s", pstr[j]);

  fclose(fp);
  for(j=0;j<=i;j++)
    free(pstr[j]);
  printf("\n\n\t\t\t     press enter when ready");

  i = getchar(); ++i;
  return 0;
}
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.