Hello Daniweb,
I am creating a project (using NetBeans for GUI) for my Data Structures and Algorithms course. The gist of it is as follows:
-Take 7 letters as input
-Display a valid word as output (And later, when this issue is hopefully resolved, several words)
Basically a software one could theoretically use to cheat in games like Scrabble. This is the procedure i am using:
-Make a hashed-table of linked lists of the words a provided dictionary, such that each linked list contains only words that start with a particular letter, ex:
A|bstract->abate->abbreviate->..
B|alloon->Barge->...
..
..
Z|inc->...

-Make a linked list of all permutations of the 7 letters (currently assuming no repitition).

-For every letter in the permutations, use the relevant row in the hash table to see if that permutation is a legal word
-if it is, print it and break.

This algorithm is working, but it is taking WAY to long. If i type in "pelicna", it returns "pelican" in a few seconds, but if i type somthing like "ncilepa", it could take up to 10 minutes and either print the word, or even return a NullPointerException. Could you guys help me out here?

Recommended Answers

All 31 Replies

Member Avatar for iamthwee

Ten minutes... pfft that's nothing.

At least it gives you time to make a tea. Seriously though, use a letter frequency counter algo.

Member Avatar for iamthwee

You would only use a hash map/ linked list if you want to create more than one word out of the jumbled set

I really have to lower the time to sayyyy under 1 minute bro. Or it wont be acceptable. What i'm doing is basically comparing every letter of every combination with every letter of the words in a row of a dictionary. There are 5040 combinations, and perhaps 20 words in the row of the dictionary. Should it really be takin 10 minutes? Or you think there is maybe another problem in the design/implementation?

There are 5040 combinations, and perhaps 20 words in the row of the dictionary. Should it really be takin 10 minutes? Or you think there is maybe another problem in the design/implementation?

Running 5K iterations against 20 data items should so so fast you can't measure it. I have a Java program to find anagrams and it searches an 18K word database for anagrams in real time faster than I can type the letters.

There's obviously somethiung very wrong with your code - post it here and let's see what it is.

Running 5K iterations against 20 data items should so so fast you can't measure it. I have a Java program to find anagrams and it searches an 18K word database for anagrams in real time faster than I can type the letters.
There's obviously somethiung very wrong with your code - post it here and let's see what it is.

Thank you for that, JamesCherrill :)
Unfortunately i dont think it would be very convenient for either of us if i were to post the entire code here, as it is a Netbeans project with a several classes. I am however posting the code that does the very comparing here, and i hope it is enough. In case it is not, please allow me to send you the project so you may see where the trouble is. I will be very grateful.

public class Matcher {

    protected List realWords;

    public void makeWords(){
        realWords = new List();

        //for all the combinations:
        bigLoop:
        /*
         * combos is a linked list of all possible permutations
         */
        for(int r=0; r<Scrabbulous.scrab.combos.getSize(); r++){
            System.out.println(r);
            //for all the letters in the current combination:
            for(int i=0; i<Scrabbulous.scrab.combos.Head.getWord().length(); i++){

                //the following constant is declared to make life easier for the programmer. It is the current row of the Dictionary (Linked list)
                final List currDictionWordList = Scrabbulous.scrab.Dictionary.getHeaders()[Scrabbulous.scrab.Dictionary.hasher(Scrabbulous.scrab.combos.getNode(i).getWord())];

                //for all the nodes in that List of the Dictionary which start with the same letter as the letter specified above:
                for(int j=0; j<currDictionWordList.getSize(); j++){
                    //if the length of the word in that location of the dictionary is the same as the length of the combination:
                    boolean isSame = true;
                    //Comparing each letter
                    if(currDictionWordList.getNode(j).getWord().length() == Scrabbulous.scrab.combos.Head.getWord().length()){
                        for(int k=0; k<Scrabbulous.scrab.combos.Head.getWord().length(); k++){
                            if(currDictionWordList.getNode(j).getWord().charAt(k) != Scrabbulous.scrab.combos.getNode(r).getWord().charAt(k)){
                                isSame = false;
                            }
                        }
                        if(isSame == true){
                            realWords.Add(Scrabbulous.scrab.combos.getNode(r).getWord());
                            break bigLoop;
                        }
                    }
                }
            }
        }
    }
}

Nothing there jumps out at me, but since you are usig Netbeans, why not run the profiler and see where your CPU time being consumed? It's the clock face with a green run arrow near the right-hand end of the standard toolbar....

You have 4 loops in your algorithm which means if we do a naive Big-O analysis the time complexity becomes O(n^3)/O(n^4) which doesn't scale well as the input size keeps growing.

Not sure why the algorithm has to be so complicated. How about something like:

  1. Accept user input and generate all permutations
  2. Create a set of dictionary entries
  3. Check if the given permutation is in the set and you are done...

The problem with your code is that it's doing a "linear" search for finding a word which is going to be painfully slow, especially given your algorithm. If you don't want to revise your algorithm (i.e. want to keep using the alphabet to word list mapping), go with a Set instead of a linked list and you should see visible speedup.

A few generic comments:

for(int r=0; r<Scrabbulous.scrab.combos.getSize(); r++){

This will basically recompute the size every iteration, not something you want. Instead use:

for(int r=0, len = Scrabbulous.scrab.combos.getSize(); r < len; r++){

Also, this piece of code:

for(int k=0; k<Scrabbulous.scrab.combos.Head.getWord().length(); k++){
    if(currDictionWordList.getNode(j).getWord().charAt(k) != Scrabbulous.scrab.combos.getNode(r).getWord().charAt(k)){
        isSame = false;
    }
}

is basically re-implementing equals method of the String class.

Member Avatar for iamthwee

Another ridiculously simple algorithm (based on my own) is to sort the user inputed string in alphabetical order. Then go through each word in the dictionary file and do the same. If a match is found you have an anagram.

E.g

User input.

'pielnac'

Sort in alphabetical order: aceilnp

Word in dictionary 'pelican' which when sorted is aceilnp

This here would find EXACT matches. Use the letter frequency counter to find all other subsets. It doesn't really get much easier than that. Low time complexity and doesn't use any hash maps.

You could use:

String s = "...";
int counter = 0;
for( int i=0; i<s.length(); i++ ) 
{
    if( s.charAt(i) == '$' ) 
    {
        counter++;
    } 
}

To split the string into individual characters. Looking at the code you have originally posted you should have NO issue implementing this.

Thank you Iamthewee, s.o.s, and JamesCherrill for your input, I shall go through it more thoroughly as soon as i am done with my last examination on Friday :)

Let me take this oppurtunity to add that the finished program will have to take into consideration the possibility that a 7 letter valid word does not exist in the permutations, in which case it will display the best less-than-7-letter word. The word "best" here means yielding the most face points in a game of Scrabble. I think this is why the algorithm is getting complicated, as we need the entire dictionary and the permutations inside the RAM throughout the process of matching.

Iamthewee, I just had a look at your last suggestion and i really like it, but I think if i implement it I will be doing away with a data structure, and use of data structures is the point of the course. Nontheless, its making me feel quite foolish to be lingering after a clumsy solution when such a neat one is available, and I shall use your one if/when i start running out of time.

There's no problem holding it all in RAM. I found a large Scrabble dictionary with about 180k words in it, but that's only a few MegaBytes stored in an ArrayList<String> and you can scan its entire contents in milliSeconds.

I did experiment at the time with holding a sorted version of every word as key in a TreeMap (the original words as values). That gives a guaranteed O(log n) time to find any anagram. That's surely got to be the the fastest algorithm, even after adding in the overhead of dealing with multiple words that have the same sorted letters. In the end, however, it was just as subjectively instantaneous, but easier code, to check anagrams on the fly using a method that matches letters one at a time

Member Avatar for iamthwee

A few thoughts...

Others here have posted that your hashed map solution should have a reasonable time complexity. JamesCherrill says his hashed maps solution takes a few seconds if that.

So clearly YOUR algorithm is flawed in some respect for it to take ten minutes.

Now in regards to the cleanest and fastest solution, clearly it must be the letter frequency counter as listed in my link. It uses no hashed maps and it finds ridiculously long words fast, and all the subsets of those letters as well.

However, let's look at your requirements. You are in a data structure course so clearly your instructor is expecting a hashed map type solution. With concise programming.

Your snippet looks well written. I guess it is one of those times where being 'smart' might not get you the best marks.So yes absolutely use your hashed map, you will be correctly accredited for this.

You could mention somewhere in the notes that if you didn't have to use datastructures you would opt for a simpler solution like the one I have mentioned.

In my opinion the best solution whilst STILL using a hashed map is:

http://stackoverflow.com/questions/18641243/how-can-i-optimize-permutations-of-words-in-a-scrabble-game

...but you may not have the time to rewrite your code??

Good luck.

JamesCherrill says his hashed maps solution takes a few seconds if that.

So now I had to log the actual elapsed times - they are actually about 60 microSeconds for 3 letter anagrams, up to 1.5 milliSecs for 9 letters. That's for the simple search of an ArrayList of Scrabble words matching letters in a small nested loop.
I didn't keep the TreeMap version, that was just an experiment, and I didn't finish all the tidying up around multiple words with the same anagrams. My gut feel is that it would be faster, but at a worst case thats one or two millisecs I decided that the simple approach was fast enough.

All this encouraged me to finish the map version, and it is insanely fast.

I built a map where the key is a String of all the letters in a word, sorted in to alpha order, and the value is the word or words that match those letters.

Then given a word I sort its letters into alpha order and use the resulting word to get the matches from the map.

Because you always know the word's length, I split the map into an array of 20 maps, one for each length.
The master word list was the 267,750 word master UK/US scrabble word list.

Loading the list from file and building the maps took 700mSec (one-time only, at startup).
Finding all the words for a given anagram takes a consistent 2 microSeconds if the Map is a HashMap, 4 microSeconds if it is a TreeMap. (The very first anagram takes about 4 times that).

I'm happy to share the relevant code if anyone is interested.

Member Avatar for iamthwee

^^

Probably not a good idea if this is his homework assignment. Perhaps maybe a good idea to do so for instructional purposes... But only much later.

Thanks for waiting, i now have my attention back on this project.
I just prepared, for clarity, a complete summary of the path my program takes. I did this while walking through my program code. So if you see any holes in the program structure, do let me know. I will begin to apply the advice given so far as soon as i iron out a recent bug. (I tried to convert the permutation data structure from a List to Hash table, and now i'm turning it back to List but for some reason the IDE refuses to recognise it as anything apart from a Table)

Also, if you require expansion of any of the pseudocode, or even the related code snippet, let me know and i will provide it.

->The Tiles array is an array of 26 Tile objects, where each Tile has a letter of the alphabet, and associated "points" integer.
->The combos List is a List of all permutations of the letters input.
->the Dictionary Table is a hash table of the words in the dictionary, semi-sorted in alphabetical order.
->The anagram class creates a Linked List of Permutations of the input letters
->The Matcher will compare the Link List with the Dictionary and output real words.
-----------------------

-Declare static Coordinator scrab (scrab contains several miscelenous global objects)
void main(){
    Initialize scrab
        scrab declares and initializes:
            The input window, the Tiles array, the combos List, and the Dictionary Table. 
    Back to void main.
    Fill up the Tile array
    Set visible the input window.
    Input Window:
        Declare the Output window, an array for the input letters and the anagram class.
        when the submit button is pressed:
            fill the "Head" of the combos list with the combination that was input.
            initialize and execute the scrambler. The combos List from scrab is now populated.
            Initialize and set visible the output class
            Output:
                Declare and initialize the Matcher
                Matcher:
                    Declare a realWords List
                execute the matching method
                Matching method:
                    Initialize the realWords List
                    perform the matching. (Code given in previous post)
                    add real words to the realWords List
                Display the realWords List
/end

I still can't see anything in there that could get you within orders of magnitude of 600,000,000 microSecs with such tiny data volumes.
Did you try the NetBeans profiler to where that CPU is being burned?

very nice info. Thanks for the post.

I have been trying to run the profiler, but I have never really used it before, and when I calibrate it, it gives a nullpoint exception.

I have however significantly improved the algorithm since I last posted here.I added a continue statement to break out of an inner loop when it was no longer neccessary to stay in it, and I slightly changed the iteration structure in terms of iterating variables so that one of the outer loops became redundant, and could be removed.
My Worst case running time is now approximately 1 minute 30 seconds. I still however want to make it smaller, specially since James Cherril seems to be convinced that it can be done in the blink of an eye.
I hope, however, that this has made it reasonably obvious that the CPU is indeed being burned somewhere in the iterations themselves, and it is not likely to be some other confounding variable.

Just a random thought - it's not possible that any of those get methods are causing the word list file to be re-loaded each time (that could explain the time)?

It's bizarre that your profiler is going NPE - the profiler is absolutely the most (only?) valuable tool to sort out performance problems, so it would be worth your time to try to sort that out. Can you skip the calibration and go on to profile regardless?

No I cant do that, it gives me an error saying the following:
"Error occurred during initialization of VM
Could not find agent library C:/Users/...../jdk16/windows-amd64/profilerinterface.dll in absolute path, with error: Can't find dependent libraries
Java Result: 1"

No no its not doing that. It was doing that in the very beginning, and that was horrendously slow, it took 7 minutes to do around 1000 iterations.

What it does now is to load the dictionary as a hash table the moment the the program is initiated.

The Linked List of combinations is stored in the same class as the dictionary is, though it is obviously populate later. And as far as i can tell (and intended), it is not reconstructing either the combinations list or the dictionary table with each iteration.

However, there is a chance that you may be right so i'll just do what normally works best: Take a pen and paper and work my way through the program to see if what i have intended is happening.

It also might be worth mentioning that I am "not" using the Java standard data structures. I have made my own Linked List class and Dictionary class, as we were required to. If you want, i can post them in the next post. They are not too big.

That message implies you are still on Java 1.6. If so an immediate upgrade to 1.7 is a top priority, noit just for the language/API enhancements buta also for security and perfomance fixes. Oracle officially stopped releaseing public updates for 1.6 in March 2013 (see http://www.oracle.com/technetwork/java/eol-135779.html )

Having done that, a clean install of the latest NetBeans 7.4 should fix the profiler problem.

I still find this bizzare. The only Java things I have ever seen taking more than a few seconds were either IO-bound or massively recursive.

Thanks, the profiler is working now. Also, since the update, the worst case running time is now about 50 seconds.

The profiler itself is showing a "hill" of processes. Going up the slope, there are mostly (numerous) java.awt.* and java.swing.* methods. Each of them take up, according to the profiler, 22,456ms. Finally, near the top of the hill, is my Entry submission method, which has a time of 22,456ms as well.
After that is the method that does the actual matching itself, and strangely enough, it is taking less time. It seems to be taking 21,730ms. Following that, the hill slopes down with a series of "Self-time"s, each taking 0ms. Isnt this odd?

I have now started looking ahead of this step and can see a large potential problem. Right now, I am only checking for 7 letter words. If I am to complete this program to perfection, it should be able to detect words of any length from 3 to 7.
I determined that even for 6 letter words, I will need another linked list of 5040 combinations, as we can arrange 6 letter words in 6! = 720 ways, and we can choose those 6 letters in 7 ways, hence 6! times 7 = 5040. Then i'll have to do the same for 5, 4 and 3 letter words. My program seems tipped to getting slower and slower from this point. How do I efficiently implement this functionality that I want? I had a discussion with my teacher and she told me to stop worrying about the length and just use a fewer number of letters, but I really want to complete it with 7 letters.

EDIT:~s.o.s~, the .equals method isnt exactly a suitable replacement for that snippet you picked, as .equals returns true if the combination of letters is the same in both. I used that for loop as it returns true only if the arrangement as well as combination is the same.

That profiler output is pretty typical.You cklick the start button so that triggers a whole raft of methods calling other methods until it gets down to your code, after which all the methods terminate. So they all take at least as long as your method. The "self-times" are just an artifact of the profiling. Just ignore the awt/swing stuff.
It looks like your entry submission method is taking 22 secs, of which 21.7 are spent in the matching method that it calls. That's where the problem lies.

How do I efficiently implement this functionality that I want?

My fastest version used a hashmap as per my previous posts.

Thank you everyone for the help, I learned a lot from you all :)
My Project got accepted and I am expecting an A.

Congratulations!

Member Avatar for iamthwee

Excellent news! Since your assignment has been processed can some test my snippet converted to java.

I don't have a compiler to hand:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class BufferedReaderExample 
{

    public static void main(String[] args) 
    {

      int vm;
      String testWord = "pelican";

      vm = testWord.length();

      String alphabet = "-abcdefghijklmnopqrstuvwxyz";
      int[] acnt = new int[27];
      int[] bcnt = new int[27];


      for ( int i = 1; i < 27; i++ )
      {
        acnt[i] = 0;
      }
      for (int a = 0; a < vm; a++ )
      {
        for ( int j = 1; j < 27; j++ )
        {
          if( testWord.charAt(a) == alphabet.charAt(j) ) 
          {
            acnt[j]++;
          }
        }
      }




      BufferedReader br = null;

      try 
      {

        String sCur;

        br = new BufferedReader(new FileReader("input.txt"));

        while ((sCur = br.readLine()) != null) 
        {

          int size;
          size = sCur.length();
          for ( int i = 1; i < 27; i++ )
          {
            bcnt[i] = 0;
          }
          for ( int i = 0; i < size; i++ )
          {
            for ( int j = 1; j < 27; j++ )
            {
              if( sCur.charAt(i) == alphabet.charAt(j) ) 
              {
                bcnt[j]++;
              }
            }
          }
          int counter = 0;
          int mx = 0;
          for ( int k = 1; k < 27; k++ )
          {
            if ( bcnt[k] == acnt[k] )
            {
              mx++;
            }
            if ( bcnt[k] <= acnt[k] )
            {
              counter++;
            }
          }

          if ( counter == 26 )
          { 

            System.out.println(sCur);

          }
        }

      } 

      catch (IOException e) 
      {
        e.printStackTrace();
      } finally {
        try {
          if (br != null)br.close();
        } catch (IOException ex) {
          ex.printStackTrace();
        }
      }

  }
}

WOW this is a superb algorithm! It is returning all possible words you can make of an input word available in the dictionary in the blink of an eye. If i could have come up with with something like this, I would have merely had to store the "sub-words" in an array and return the one with most points. But then, I may not have recieved points for using data structures.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.