java unscrambler

Please support our Java advertiser: Programming Forums - DaniWeb Sister Site
Reply

Join Date: May 2009
Posts: 5
Reputation: giri.pankaj is an unknown quantity at this point 
Solved Threads: 0
giri.pankaj giri.pankaj is offline Offline
Newbie Poster

java unscrambler

 
0
  #1
May 10th, 2009
Have written an unscrambler in java. This takes in a scrambled word for input and returns the unscrambled word. The way it works is that it creates all permutations of the provided word (I have borrowed the permutation generating code from daniweb i guess) and compares each word to an array formed by parsing a dictionary and reports all matches.
Please find attached the zip file(Unscrambler) containing all files. One problem that this one has is the fact that it tries to dump all permutations in an array, which results to an out of memory error when the word is sufficiently large. Any suggestions to overcome this problem would be appreciated.

Compilation process :
javac ParseWord.java StringPerm.java
javac -classpath . Main.java
java Main <scrambled word>
eg... java Main emit
emit
item
mite
time
Attached Files
File Type: zip unscrambler.zip (128.7 KB, 17 views)
Reply With Quote Quick reply to this message  
Join Date: Nov 2008
Posts: 823
Reputation: verruckt24 is a jewel in the rough verruckt24 is a jewel in the rough verruckt24 is a jewel in the rough verruckt24 is a jewel in the rough 
Solved Threads: 73
verruckt24's Avatar
verruckt24 verruckt24 is offline Offline
Practically a Posting Shark

Re: java unscrambler

 
1
  #2
May 10th, 2009
Firstly I guess you need not dump all generated permutations into an array. You should only dump those permutations which generate a word out of the unscrambling of the original word, which you can figure out by comparing each genarated word with the list of all dictionary words that you have gathered.

Searching each generated word with a long list of existing words (which might be in thousands) is itself a time consuming task, but you can have the dictionary words put into an ArrayList and then compare each generated word using the contains() method of that class, so I guess this could be much faster than iterating over the words.

Here in this algorithm another very important decision is to generate as few unscrambled words as possible. So in this sense you should try to able to discard a combination of the letters, which is not going to form a word, as early as possible. For eg. in the example you've given of 'time', you should be able to detect that there are no words of length four starting with 'tm' and hence shouldn't move further into generating the permutations starting with this combination. The more longer the word the earlier the better. So if suppose you have been given a scrambled combination of the letters of the word 'difficult' and you come to know that there are no nine lettered words starting with the two lettered combination 'df', you can eliminate the generation of all the arrangements of the rest of the seven letters where the word starts with 'df' then you would be able to save yourself the generation time of 5040 (7!) such words, which is a huge advantage. So work on these lines, and keep posting your enhancements this looks to be a promising task.
Get up every morning and take a look at the Forbes' list of richest people. If your name doesn't appear.... GET TO WORK !!!
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 5,273
Reputation: iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold 
Solved Threads: 378
Featured Poster
iamthwee's Avatar
iamthwee iamthwee is offline Offline
Posting Expert

Re: java unscrambler

 
0
  #3
May 10th, 2009
I disagree, no matter how many enhancements you make, going down the permutation road isn't going to yeild satisfactory results.

Think letter frequency comparisons.

http://www.daniweb.com/forums/post212129-2.html
*Voted best profile in the world*
Reply With Quote Quick reply to this message  
Join Date: Nov 2008
Posts: 823
Reputation: verruckt24 is a jewel in the rough verruckt24 is a jewel in the rough verruckt24 is a jewel in the rough verruckt24 is a jewel in the rough 
Solved Threads: 73
verruckt24's Avatar
verruckt24 verruckt24 is offline Offline
Practically a Posting Shark

Re: java unscrambler

 
0
  #4
May 10th, 2009
Alright that advice comes from someone who has done this program first hand, whereas I haven't had such experience, the optimizations I suggested were some of several that came to my mind upon an initial read of your problem statement, so you know who to bet your money on.
Get up every morning and take a look at the Forbes' list of richest people. If your name doesn't appear.... GET TO WORK !!!
Reply With Quote Quick reply to this message  
Join Date: Sep 2008
Posts: 1,629
Reputation: BestJewSinceJC is a splendid one to behold BestJewSinceJC is a splendid one to behold BestJewSinceJC is a splendid one to behold BestJewSinceJC is a splendid one to behold BestJewSinceJC is a splendid one to behold BestJewSinceJC is a splendid one to behold 
Solved Threads: 205
BestJewSinceJC BestJewSinceJC is offline Offline
Posting Virtuoso

Re: java unscrambler

 
1
  #5
May 10th, 2009
@OP ignore this, I am asking a question relevant to this thread rather than giving you advice here. .

Searching each generated word with a long list of existing words (which might be in thousands) is itself a time consuming task, but you can have the dictionary words put into an ArrayList and then compare each generated word using the contains() method of that class, so I guess this could be much faster than iterating over the words.

Don't you have a dictionary stored in alphabetical order? You don't need to compare anything to every possible word, just check alphabetically to see if it is there period. Right?
Last edited by BestJewSinceJC; May 10th, 2009 at 4:50 pm.
Reply With Quote Quick reply to this message  
Join Date: Jan 2008
Posts: 3,836
Reputation: VernonDozier has a reputation beyond repute VernonDozier has a reputation beyond repute VernonDozier has a reputation beyond repute VernonDozier has a reputation beyond repute VernonDozier has a reputation beyond repute VernonDozier has a reputation beyond repute VernonDozier has a reputation beyond repute VernonDozier has a reputation beyond repute VernonDozier has a reputation beyond repute VernonDozier has a reputation beyond repute VernonDozier has a reputation beyond repute 
Solved Threads: 503
Featured Poster
VernonDozier VernonDozier is offline Offline
Senior Poster

Re: java unscrambler

 
0
  #6
May 10th, 2009
For sure, don't generate every permutation. I am wondering if a hash function could be useful for this. Don't generate any permutations. Get a good hash function for the scrambled words. A hash function that could certainly be improved upon would be this: a= 0, z = 25, add the values of the letters, so

tae = 19 + 0 + 4 = 23

eat, ate, and tea also have this hash function. So when you calculate the hash (23 in this case), have a map where 23 is the key and it maps to the set of all words that also hash to 23. Then see if any of them have the same letters.

A word like "lie" also maps to 23, so the hash function can definitely be improved. The point is that there won't be the whole dictionary to search, just a much smaller set. It'll take a significant time to set up the map, but it only needs to be done once and after that, searches will be extremely fast, so this method is best for a program where you aren't generating the entire map every time you search for a word. If you can figure out a hash function where ONLY words with the same letters map to the same value, that's perfect. There will be nothing left to check.

Regardless, I agree with BestJewSinceJC; take advantage of searching an ORDERED (i.e. alphabetized) list and cut your search time down from O (n) to O (log n).
Last edited by VernonDozier; May 10th, 2009 at 8:10 pm.
Reply With Quote Quick reply to this message  
Join Date: May 2009
Posts: 5
Reputation: giri.pankaj is an unknown quantity at this point 
Solved Threads: 0
giri.pankaj giri.pankaj is offline Offline
Newbie Poster

Re: java unscrambler

 
0
  #7
May 11th, 2009
Well, the dictionary words are in a sorted order and I do a binary search for each word generated by the permutation, instead of a linear search. This has resulted in a significant improvement in the search times. The actual problem lies with the permutation generation algorithm which craps out with an "Out Of Memory Error" if a large word, say with 10 letters is passed in. I had initially considered comparison to be performed for each permutation generated, but this would result to an overhead of either a function call or an object interaction for every generated permutation. I guess the best way could be generate permutations and perform comparisons to collective chunks... say for eg, generate 10000 permutations and let the comparison function consume the generated permutations, after it is done, notify the permutation generator to generate the next 10000 permutations and so on. Please let me know any other suggestions.
Reply With Quote Quick reply to this message  
Join Date: May 2009
Posts: 5
Reputation: giri.pankaj is an unknown quantity at this point 
Solved Threads: 0
giri.pankaj giri.pankaj is offline Offline
Newbie Poster

Re: java unscrambler

 
0
  #8
May 11th, 2009
Vernon, your idea seems very good. Please correct me if I am wrong.
This would save me from generating any permutations at all.
Just need to have a hashmap with the simple hashing function and extract all words from the dictionary with same letters and length in the same bucket. Now, when I get the scrambled input, i just need to get the hash of the scrambled word, figure out which bucket the word falls in, and print out all the words appearing in that bucket. Have I understood u correctly?
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 5,273
Reputation: iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold 
Solved Threads: 378
Featured Poster
iamthwee's Avatar
iamthwee iamthwee is offline Offline
Posting Expert

Re: java unscrambler

 
0
  #9
May 11th, 2009
Not sure how quick Vernon's method would be but I know mine to be pretty fast.

Basically, you sort your scrambled word into alphabetical order.

For example "lorac" would be "aclor". We then search our dictionary for any five-letter words that also sorted to the pattern "aclor". (I used a list of about 100,000 words and found four solutions: "calor", "carlo", "carol", and "coral"). This is essentially, the same as using a letter frequency algo, like the one I posted. Which you can test by using (if you didn't know already):

  1. java -jar Yay.jar

You can shave off time if you group the dictionary by order of word length.

So, if you want to find a five letter scrambled word your program would just search for all the words that a five letters long and it would skip all the rest.

This algorithm has no space complexity issues. In other words, it would take no extra time to unscramble a 9 letter word as opposed to the 3 letter word.
Last edited by iamthwee; May 11th, 2009 at 8:15 am.
*Voted best profile in the world*
Reply With Quote Quick reply to this message  
Join Date: Nov 2008
Posts: 823
Reputation: verruckt24 is a jewel in the rough verruckt24 is a jewel in the rough verruckt24 is a jewel in the rough verruckt24 is a jewel in the rough 
Solved Threads: 73
verruckt24's Avatar
verruckt24 verruckt24 is offline Offline
Practically a Posting Shark

Re: java unscrambler

 
0
  #10
May 11th, 2009
Don't you have a dictionary stored in alphabetical order? You don't need to compare anything to every possible word, just check alphabetically to see if it is there period. Right?
But even then you are comparing the generated word with some words having the starting letters same as it has, this certainly takes up time more so if you are comparing a common start for e.g. 'co' there might be so many words in th dictionary count,common,complex,complexity,compare,comparing,complexion and so on, what then you are still making comparisions which are necessarily a waste of time. The contains method will be a sure way of being fast, with all left upto the implementation of the Collection class, which certainly is much faster.
Get up every morning and take a look at the Forbes' list of richest people. If your name doesn't appear.... GET TO WORK !!!
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Similar Threads
Other Threads in the Java Forum
Thread Tools Search this Thread



Tag cloud for Java
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC