Hi all, well my problem isn't a practical problem. I have a project to do which is a word game.First of all i have a .txt (a dictionary) my program has to fetch a random word and shuffle it. With the combination of those letters the program has to find words and(sub-words/sub-strings) in the dictionary. The user tries the different combinations until there is a corresponding word in the dictionary.My problem is memory wise the program has to constantly search the .txt file. So i have been wondering how i can optimize that using Object-oriented programming..Would "stacking" my dictionary be a good idea? Or maybe linked lists? I really don't know what would be the most efficient way...

Thank you for your time.

Recommended Answers

All 16 Replies

Just read your words from the .txt file into an array of strings. Then next time you have to search through words just search through the array instead of .txt file. You can also organize them by alphabetical order like words[26][70] This will make 26 letters and 70 spaces for different words that start with the same letter. Also I don't think reading from a file would be that slow. I understand on computers in 80's it would save some time but hard disks now-a-days are pretty fast, and simple game should not take that much time

1) Make sure the file is in sorted order
2) Read the file into an array of strings (not a 2D-char array as suggested)
3) Use a Binary Search (google it) to find whatever the user enters.

If you are permitted to do so, you can perhaps load your text file into a std::set and use that as your dictionary.

Forget about reading it in an array. Use the correct data structure. Lookup hashtable, which is just created for situation like these, lookup. Using hashTable, you can lookup in the dictionary on average on constant time, as well as insert in constant time.

Basically, thats all the optimization you need. Rest is just implementation.

As suggested already, you should store all the strings from the file into a set. Now, you can choose:

- std::set: Uses a balanced-tree implementation to sort and search for words. Most operations are O(logN) time.

- std::unordered_set: This is going to be part of C++ standard library very soon, and most implementations support it already (Microsoft and GCC). This is a hash-table implementation, so most operations are in constant O(1) time (but your mileage may differ due to collisions problems).

The two options turn out to be the same coding-wise, and for performance, unordered_set is probably better, but that's always something you need to test. And if this is for an assignment, it's probably better to stick to (really) standard stuff like std::set (because you don't want to require to corrector to install anything additional if his implementation does not happen to have unordered_set).

>>So i have been wondering how i can optimize that using Object-oriented programming.
I don't think OOP is useful for this. And, on a side-note, things like std::set, #include <algorithm> and #include <functional>, almost the entire STL, are pieces of code that would be classified under "Generic Programming" not under object-oriented programming. C++ is a multi-paradigm language with strong synergy between the different styles. For instance, in the standard libraries only a few things like the IOStream library could be classified as more OOP than anything else, the rest is predominantly GP with some clever use of OOP and Procedural Programming (C-style). Sticking purely to one paradigm is almost never a good thing in C++ (which is why single-paradigm languages like Java or C are often criticized for a lack of options to the developers).

Member Avatar for iamthwee

Actually, picking a correct data structure for THIS problem is largely a red herring.

In fact, reading in the entire dictionary file for each go and using a letter frequency algo is sufficient and should cause no significant time lag - unless you're using a computer from the dark ages.

Typically, the student when faced with the age old problem, 'Given a set of scrambled letters, find all sub-words and exact words' tend to go down the permutation route.

This is most probably what the OP means by optimization. Anyone who has written a permute algo knows how expensive it is.

This is where the letter frequency algo wins on all counts. Of course, you can implement a std::set WITH the letter frequency algo which would make it all the faster. But in terms of optimization it is hardly noticeable.

Actually, picking a correct data structure for THIS problem is largely a red herring.

In fact, reading in the entire dictionary file for each go and using a letter frequency algo is sufficient and should cause no significant time lag - unless you're using a computer from the dark ages.

Typically, the student when faced with the age old problem, 'Given a set of scrambled letters, find all sub-words and exact words' tend to go down the permutation route.

This is most probably what the OP means by optimization. Anyone who has written a permute algo knows how expensive it is.

This is where the letter frequency algo wins on all counts. Of course, you can implement a std::set WITH the letter frequency algo which would make it all the faster. But in terms of optimization it is hardly noticeable.

I don't get why he would even want to consider permutation/combinations? The users, i.e the client provides the computer with the subword to check, all his program has to do is, check if that subword is part of the dictionary. The only optimization OP has to do is to pick the right data structure, and be at least decent with the coding.

Member Avatar for iamthwee

>I don't get why he would even want to consider permutation/combinations?

With the combination of those letters the program has to find words and(sub-words/sub-strings) in the dictionary

If you look at the OP's above definition it implies that given a word picked out of a dictionary file at random, it scrambles up the letters and then finds all words that are subwords within that word.

An example is perhaps easier to illustrate->

1.Pick word from dictionary [theres]
2.Scramble letters [threse]
3 Find all subword contained within the set 
  [the, here, rest, thee, ere...]

Now the question is, how does the OP efficiently build the list for #3? This is where his question of optimization arises I believe.

The point is... you don't even need to consider ANY data structures to efficiently solve this... A letter frequency algorithm will do the lot.

You could even build the dictionary file using the letter frequency algo to do exactly that.

And then use this new file as a lookup.

But from your given example, the subsetOf(theres) == subSet(threse).

I figured that was a miscommunication from the OP originally.

Member Avatar for iamthwee

But from your given example, the subsetOf(theres) == subSet(threse).

I figured that was a miscommunication from the OP originally.

threse is just 'theres' scrambled up.

This is part of the criteria for the game he is writing. So the human guessing not only has to guess subwords but the entire word as well, that's why the original is scrambled up.

For me hard disk is plenty fast enough. My Python code for candidates for multiword anagrams can be found in: http://www.daniweb.com/software-development/python/code/365540/1566911#post1566911, based on my only exact anagram words finding one.

Of course with many words the most simple approach takes little time, like with long names. It is highly usable though. With little more advanced approach and psyco JIT module, my full multiword anagram program manages to find anagrams from my name in English (4324 anagrams) in under 1.5 seconds using ready list of anagramwords in simple text file, quite reasonable.

threse is just 'theres' scrambled up.

This is part of the criteria for the game he is writing. So the human guessing not only has to guess subwords but the entire word as well, that's why the original is scrambled up.

Oh I get it. Computer picks a work, scrambles it up. The client tries to guess the maximum possible word from the given jumbled word. But I still don't see where he needs to generate any permutation.

Thanks everyone,@firstperson I need the permutations because the user has to know how many sub strings are available beforehand.This is what i mean http://tinypic.com/r/16m3cly/7

You would probably want to take each word that occurs, create a vector of the number of occurrences of each letter for that word, then create a std::multimap where the key value is this vector of letter occurrences and the value is the word. When the user inputs the scrambled letters, count the letter frequency and search for matches in the multimap. This is going to be faster than computing all the possible (variable-length) permutations of the input letters and searching for each one of them.

Member Avatar for iamthwee

But I still don't see where he needs to generate any permutation.

As I said generating permutations (as in the strict mathematical sense of the word) is a red herring.

Given my original code snippet you can adopt Mike's method to create a multimap or vector to push the words found and sort them by word lengths.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.