We're a community of 1.1M IT Pros here for help, advice, solutions, professional growth and fun. Join us!
1,080,680 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Start New Discussion Reply to this Discussion

How to keep a word dictionary in one variable

I have too make an application in C that uses a dictionary. I have 54000 words in a file, on per line and I need a way to handle it.
- I first wanted to keep it in an array but as it's bigger that INT (32767) I cannot do that.
- The second option is to keep all in a very long string but it seems it give me some errors and throws me out of program. I tried to use a pointer for char like this:

FILE *f=fopen("d.txt","r");
   char *line;
   while(fgets(line,sizeof(line),f)){
     strcat(str,line);//concatenate all strings to this one - does not work, the pointer does not contain that words...
     //  printf("%s\n",line);
   }
   fclose(f);

Is there a way too keep this dictionary in a variable in C? I don't want too search in the file every time cause it's time consuming and not reliable.

Thanks!

8
Contributors
13
Replies
7 Hours
Discussion Span
2 Years Ago
Last Updated
15
Views
Clawsy
Posting Whiz in Training
225 posts since Feb 2008
Reputation Points: 11
Solved Threads: 7
Skill Endorsements: 0

I have too make an application in C that uses a dictionary. I have 54000 words in a file, on per line and I need a way to handle it.
- I first wanted to keep it in an array but as it's bigger that INT (32767) I cannot do that.
- The second option is to keep all in a very long string but it seems it give me some errors and throws me out of program. I tried to use a pointer for char like this:

FILE *f=fopen("d.txt","r");
   char *line;
   while(fgets(line,sizeof(line),f)){
     strcat(str,line);//concatenate all strings to this one - does not work, the pointer does not contain that words...
     //  printf("%s\n",line);
   }
   fclose(f);

Is there a way too keep this dictionary in a variable in C? I don't want too search in the file every time cause it's time consuming and not reliable.

Thanks!

Some questions..
Pointer needs to be pointing to memory. In this case your pointer 'line' might not be pointing to a valid location
Then sizeof(line) wont give you the size to read. It only gives the size of that pointer.
Why you need to strcat again when fgets gives the whole line in one shot [ I am thinking you need single line in single string]

sree_ec
Junior Poster
119 posts since Jan 2010
Reputation Points: 22
Solved Threads: 15
Skill Endorsements: 0

hi,
I managed to read the whole file in one string, try this:

#include <stdio.h>


int main()
{
   FILE *f= fopen("test.txt","r");

   char buffer[0xFFFF];

   
   fgets(buffer,54000,f);

   printf("\n%s",buffer);
    
   fclose(f);

   return 0;
}
Software guy
Junior Poster
165 posts since May 2008
Reputation Points: 16
Solved Threads: 19
Skill Endorsements: 0

>>I managed to read the whole file in one string

No you didn't. fgets() will not read the entire file into a single character array because fgets() stops reading when it encounters the first '\n' -- which is the character that terminates a line. So all your program will do is read the first word in the file.

>>Is there a way too keep this dictionary in a variable in C
Use a linked list of words. AFAIK there is no limit to the number of words that can be store in memory that way, except of course the amount of available RAM and hard drive swap space. Hopefully, you are not foolish enough to use a 16-bit compiler such as Turbo C.

Ancient Dragon
Achieved Level 70
Team Colleague
32,275 posts since Aug 2005
Reputation Points: 5,852
Solved Threads: 2,591
Skill Endorsements: 70

Yeh well spotted. Because the file I was reading it contained extracted data from an html page. And it does not have any new line characters.

But to solve this problem he can have all the strings on one line. And each word can be separated by a delimiter. And then use strtok.

Software guy
Junior Poster
165 posts since May 2008
Reputation Points: 16
Solved Threads: 19
Skill Endorsements: 0

Hmm... keeping the dictionary in an external file seems like the best approach to me. Correct me if I'm wrong.

Regarding your first idea, it should work with a long int . Correct me if I'm wrong again.

creeps
Junior Poster in Training
82 posts since Jul 2010
Reputation Points: 85
Solved Threads: 8
Skill Endorsements: 0

>Hmm... keeping the dictionary in an external file seems like the best approach to me. Correct me if I'm wrong.
How do you suggest looking up a word? Search the file each time? Unless it's a properly designed external database, that's going to be very inefficient. The words will likely all fit in memory, so an internal data structure makes more sense here.

>But to solve this problem he can have all the strings on one line.
Once again, looking up a word is tedious and potentially inefficient. You'd have to parse out the words every time, or store them in a separate data structure. But if you're using a separate data structure anyway, just use that as the primary storage medium.

>Use a linked list of words.
I'd use a balanced binary search tree or a chained hash table. Since this is a dictionary, we can expect lookup to be the primary operation. Searching a linked list of 50k+ words may not be the best approach. Of course, you can try amortizing the performance with tricks like moving the most recent match to the front, but I think a data structure better suited to searching is the superior option.

Narue
Bad Cop
Team Colleague
15,460 posts since Sep 2004
Reputation Points: 6,483
Solved Threads: 1,408
Skill Endorsements: 55

I just tried this for fun in Turbo C, using an array of char pointers:

char *names[SIZE];

then getting the length of each name, and malloc'ing the memory for it, and putting it into the names[i++] position.

Using short names, TC was limited to less than 5k names (4,200 average).

Which means if the OP is trying to do this with TC, no matter what data structure you choose, if it's internal, you won't be successful.

Which bring me back to Creeps suggestion of simply keeping the names on a HD.
Although HD's can keep a lot of data in their cache, a better idea would be to use a virtual drive, in memory. Like a RAM disk file. I used one of these last year to avoid disk trashing while running a big project, and it worked out very well.

It is a shame to keep using a 16 bit compiler for work like this, when you have a 32 or 64 bit OS, with Gigs of RAM available, however.

Adak
Posting Virtuoso
1,641 posts since Jun 2008
Reputation Points: 456
Solved Threads: 196
Skill Endorsements: 7

@Adak: I'm pretty sure you probably know this, but for those who don't: The reason it failed is that Turbo C is a 16-bit compiler and the programs is produces is limited to 640 Meg RAM, mimus the amount needed for the operating system and other drivers. It actually winds up to somewhere between 450-540 meg.

Ancient Dragon
Achieved Level 70
Team Colleague
32,275 posts since Aug 2005
Reputation Points: 5,852
Solved Threads: 2,591
Skill Endorsements: 70

Oh, I know it well.

I put that up to help the OP see that he can't store 54k names in ANY internal data structure, if he's using turbo C. Not gonna happen. ;)

That leaves the viable options external storage, only.

Adak
Posting Virtuoso
1,641 posts since Jun 2008
Reputation Points: 456
Solved Threads: 196
Skill Endorsements: 7

@Adak: I'm pretty sure you probably know this, but for those who don't: The reason it failed is that Turbo C is a 16-bit compiler and the programs is produces is limited to 640 Meg RAM, mimus the amount needed for the operating system and other drivers. It actually winds up to somewhere between 450-540 meg.

AD, You meant kilobytes right?

mvmalderen
Posting Maven
2,612 posts since Feb 2009
Reputation Points: 2,221
Solved Threads: 281
Skill Endorsements: 36

We both did.

After I fire up the Turbo C IDE, I show just 403K available - yeeeee! ;)

Adak
Posting Virtuoso
1,641 posts since Jun 2008
Reputation Points: 456
Solved Threads: 196
Skill Endorsements: 7

I tried to keep the words in separate data structures for each letter (cause an array for all would be too long). But I dont know why it my compiler (Borland C) says that I have declared too much global data for that structure....

After that I have another data structure that keep all that data structures with letters as I described above.

Is there a better aproach? If it is... I would thankfully use it :D. I just done that in php but I want to do it in C cause I need more speed. I have a backtracking procedure that searches in the dictionary. Thats why is so important the performance.

Clawsy
Posting Whiz in Training
225 posts since Feb 2008
Reputation Points: 11
Solved Threads: 7
Skill Endorsements: 0

If your Borland is like my Turbo C from Borland, then you can't do it. I don't care how many data structs you use. You don't get more memory because you use more data structures. ;)

Global memory comes off the stack, generally. That is limited. Try the heap, it's bigger. (malloc uses the heap for it's memory source, not the stack)

The way to go with it, is to use a newer compiler. MS Visual Express, Pelles C, gcc, Code::Blocks with MingGW, are all compiler's that will allow you to enjoy larger memory access.

I love Turbo C, but for large amounts of memory - phffffftttt! :(

This is the program I wrote to test it:

/* Tries to load all 54,000 names from the names54k.txt file, into an
   array of pointers, where the memory for each name, is malloc'd.

  This was done on Turbo C, ver.1.01
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define SIZE 4000
/* SIZE will vary depending on the length of the words
   that are being saved, from the file.
*/

int main() {
  unsigned int i, j, n; 
  int len;
  char *pstr[SIZE];
  char buff[40]="";
  FILE *fp;

  printf("\n\n");
  fp=fopen("names54k.txt", "rt");
  if(fp==NULL) {
    printf("\nError opening names file");
    return 1;
  }
  i=0;
  while(fgets(buff, sizeof(buff), fp)) {
    len=strlen(buff);
    if(buff[len - 1]=='\n')
      buff[len-1]='\0';
    
    if((pstr[i]=malloc(len))==NULL) {
      printf("\nError allocating memory: i==%u", i);
      return 1;
    }
    for(j=0;j<len;j++)
      *(pstr[i]+j)=buff[j];

    buff[0]='\0';
    ++i;
    if(i>SIZE) break;
    printf("\n%s", pstr[j]);
  }
  //for(j=0;j<20;j++)  printf("\n%s", pstr[j]);

  fclose(fp);
  for(j=0;j<=i;j++)
    free(pstr[j]);
  printf("\n\n\t\t\t     press enter when ready");

  i = getchar(); ++i;
  return 0;
}
Adak
Posting Virtuoso
1,641 posts since Jun 2008
Reputation Points: 456
Solved Threads: 196
Skill Endorsements: 7

This article has been dead for over three months: Start a new discussion instead

Post: Markdown Syntax: Formatting Help
 
You
View similar articles that have also been tagged:
 
© 2013 DaniWeb® LLC
Page generated in 0.1378 seconds using 2.79MB