| | |
java unscrambler
Please support our Java advertiser: Programming Forums - DaniWeb Sister Site
![]() |
•
•
Join Date: May 2009
Posts: 5
Reputation:
Solved Threads: 0
Have written an unscrambler in java. This takes in a scrambled word for input and returns the unscrambled word. The way it works is that it creates all permutations of the provided word (I have borrowed the permutation generating code from daniweb i guess) and compares each word to an array formed by parsing a dictionary and reports all matches.
Please find attached the zip file(Unscrambler) containing all files. One problem that this one has is the fact that it tries to dump all permutations in an array, which results to an out of memory error when the word is sufficiently large. Any suggestions to overcome this problem would be appreciated.
Compilation process :
javac ParseWord.java StringPerm.java
javac -classpath . Main.java
java Main <scrambled word>
eg... java Main emit
emit
item
mite
time
Please find attached the zip file(Unscrambler) containing all files. One problem that this one has is the fact that it tries to dump all permutations in an array, which results to an out of memory error when the word is sufficiently large. Any suggestions to overcome this problem would be appreciated.
Compilation process :
javac ParseWord.java StringPerm.java
javac -classpath . Main.java
java Main <scrambled word>
eg... java Main emit
emit
item
mite
time
Firstly I guess you need not dump all generated permutations into an array. You should only dump those permutations which generate a word out of the unscrambling of the original word, which you can figure out by comparing each genarated word with the list of all dictionary words that you have gathered.
Searching each generated word with a long list of existing words (which might be in thousands) is itself a time consuming task, but you can have the dictionary words put into an ArrayList and then compare each generated word using the
Here in this algorithm another very important decision is to generate as few unscrambled words as possible. So in this sense you should try to able to discard a combination of the letters, which is not going to form a word, as early as possible. For eg. in the example you've given of 'time', you should be able to detect that there are no words of length four starting with 'tm' and hence shouldn't move further into generating the permutations starting with this combination. The more longer the word the earlier the better. So if suppose you have been given a scrambled combination of the letters of the word 'difficult' and you come to know that there are no nine lettered words starting with the two lettered combination 'df', you can eliminate the generation of all the arrangements of the rest of the seven letters where the word starts with 'df' then you would be able to save yourself the generation time of 5040 (7!) such words, which is a huge advantage. So work on these lines, and keep posting your enhancements this looks to be a promising task.
Searching each generated word with a long list of existing words (which might be in thousands) is itself a time consuming task, but you can have the dictionary words put into an ArrayList and then compare each generated word using the
contains() method of that class, so I guess this could be much faster than iterating over the words.Here in this algorithm another very important decision is to generate as few unscrambled words as possible. So in this sense you should try to able to discard a combination of the letters, which is not going to form a word, as early as possible. For eg. in the example you've given of 'time', you should be able to detect that there are no words of length four starting with 'tm' and hence shouldn't move further into generating the permutations starting with this combination. The more longer the word the earlier the better. So if suppose you have been given a scrambled combination of the letters of the word 'difficult' and you come to know that there are no nine lettered words starting with the two lettered combination 'df', you can eliminate the generation of all the arrangements of the rest of the seven letters where the word starts with 'df' then you would be able to save yourself the generation time of 5040 (7!) such words, which is a huge advantage. So work on these lines, and keep posting your enhancements this looks to be a promising task.
Get up every morning and take a look at the Forbes' list of richest people. If your name doesn't appear.... GET TO WORK !!!
I disagree, no matter how many enhancements you make, going down the permutation road isn't going to yeild satisfactory results.
Think letter frequency comparisons.
http://www.daniweb.com/forums/post212129-2.html
Think letter frequency comparisons.
http://www.daniweb.com/forums/post212129-2.html
*Voted best profile in the world*
Alright that advice comes from someone who has done this program first hand, whereas I haven't had such experience, the optimizations I suggested were some of several that came to my mind upon an initial read of your problem statement, so you know who to bet your money on.
Get up every morning and take a look at the Forbes' list of richest people. If your name doesn't appear.... GET TO WORK !!!
•
•
Join Date: Sep 2008
Posts: 1,629
Reputation:
Solved Threads: 205
@OP ignore this, I am asking a question relevant to this thread rather than giving you advice here. .
Searching each generated word with a long list of existing words (which might be in thousands) is itself a time consuming task, but you can have the dictionary words put into an ArrayList and then compare each generated word using the contains() method of that class, so I guess this could be much faster than iterating over the words.
Don't you have a dictionary stored in alphabetical order? You don't need to compare anything to every possible word, just check alphabetically to see if it is there period. Right?
Searching each generated word with a long list of existing words (which might be in thousands) is itself a time consuming task, but you can have the dictionary words put into an ArrayList and then compare each generated word using the contains() method of that class, so I guess this could be much faster than iterating over the words.
Don't you have a dictionary stored in alphabetical order? You don't need to compare anything to every possible word, just check alphabetically to see if it is there period. Right?
Last edited by BestJewSinceJC; May 10th, 2009 at 4:50 pm.
•
•
Join Date: Jan 2008
Posts: 3,836
Reputation:
Solved Threads: 503
For sure, don't generate every permutation. I am wondering if a hash function could be useful for this. Don't generate any permutations. Get a good hash function for the scrambled words. A hash function that could certainly be improved upon would be this: a= 0, z = 25, add the values of the letters, so
tae = 19 + 0 + 4 = 23
eat, ate, and tea also have this hash function. So when you calculate the hash (23 in this case), have a map where 23 is the key and it maps to the set of all words that also hash to 23. Then see if any of them have the same letters.
A word like "lie" also maps to 23, so the hash function can definitely be improved. The point is that there won't be the whole dictionary to search, just a much smaller set. It'll take a significant time to set up the map, but it only needs to be done once and after that, searches will be extremely fast, so this method is best for a program where you aren't generating the entire map every time you search for a word. If you can figure out a hash function where ONLY words with the same letters map to the same value, that's perfect. There will be nothing left to check.
Regardless, I agree with BestJewSinceJC; take advantage of searching an ORDERED (i.e. alphabetized) list and cut your search time down from O (n) to O (log n).
tae = 19 + 0 + 4 = 23
eat, ate, and tea also have this hash function. So when you calculate the hash (23 in this case), have a map where 23 is the key and it maps to the set of all words that also hash to 23. Then see if any of them have the same letters.
A word like "lie" also maps to 23, so the hash function can definitely be improved. The point is that there won't be the whole dictionary to search, just a much smaller set. It'll take a significant time to set up the map, but it only needs to be done once and after that, searches will be extremely fast, so this method is best for a program where you aren't generating the entire map every time you search for a word. If you can figure out a hash function where ONLY words with the same letters map to the same value, that's perfect. There will be nothing left to check.
Regardless, I agree with BestJewSinceJC; take advantage of searching an ORDERED (i.e. alphabetized) list and cut your search time down from O (n) to O (log n).
Last edited by VernonDozier; May 10th, 2009 at 8:10 pm.
•
•
Join Date: May 2009
Posts: 5
Reputation:
Solved Threads: 0
Well, the dictionary words are in a sorted order and I do a binary search for each word generated by the permutation, instead of a linear search. This has resulted in a significant improvement in the search times. The actual problem lies with the permutation generation algorithm which craps out with an "Out Of Memory Error" if a large word, say with 10 letters is passed in. I had initially considered comparison to be performed for each permutation generated, but this would result to an overhead of either a function call or an object interaction for every generated permutation. I guess the best way could be generate permutations and perform comparisons to collective chunks... say for eg, generate 10000 permutations and let the comparison function consume the generated permutations, after it is done, notify the permutation generator to generate the next 10000 permutations and so on. Please let me know any other suggestions.
•
•
Join Date: May 2009
Posts: 5
Reputation:
Solved Threads: 0
Vernon, your idea seems very good. Please correct me if I am wrong.
This would save me from generating any permutations at all.
Just need to have a hashmap with the simple hashing function and extract all words from the dictionary with same letters and length in the same bucket. Now, when I get the scrambled input, i just need to get the hash of the scrambled word, figure out which bucket the word falls in, and print out all the words appearing in that bucket. Have I understood u correctly?
This would save me from generating any permutations at all.
Just need to have a hashmap with the simple hashing function and extract all words from the dictionary with same letters and length in the same bucket. Now, when I get the scrambled input, i just need to get the hash of the scrambled word, figure out which bucket the word falls in, and print out all the words appearing in that bucket. Have I understood u correctly?
Not sure how quick Vernon's method would be but I know mine to be pretty fast.
Basically, you sort your scrambled word into alphabetical order.
For example "lorac" would be "aclor". We then search our dictionary for any five-letter words that also sorted to the pattern "aclor". (I used a list of about 100,000 words and found four solutions: "calor", "carlo", "carol", and "coral"). This is essentially, the same as using a letter frequency algo, like the one I posted. Which you can test by using (if you didn't know already):
You can shave off time if you group the dictionary by order of word length.
So, if you want to find a five letter scrambled word your program would just search for all the words that a five letters long and it would skip all the rest.
This algorithm has no space complexity issues. In other words, it would take no extra time to unscramble a 9 letter word as opposed to the 3 letter word.
Basically, you sort your scrambled word into alphabetical order.
For example "lorac" would be "aclor". We then search our dictionary for any five-letter words that also sorted to the pattern "aclor". (I used a list of about 100,000 words and found four solutions: "calor", "carlo", "carol", and "coral"). This is essentially, the same as using a letter frequency algo, like the one I posted. Which you can test by using (if you didn't know already):
Java Syntax (Toggle Plain Text)
java -jar Yay.jar
You can shave off time if you group the dictionary by order of word length.
So, if you want to find a five letter scrambled word your program would just search for all the words that a five letters long and it would skip all the rest.
This algorithm has no space complexity issues. In other words, it would take no extra time to unscramble a 9 letter word as opposed to the 3 letter word.
Last edited by iamthwee; May 11th, 2009 at 8:15 am.
*Voted best profile in the world*
•
•
•
•
Don't you have a dictionary stored in alphabetical order? You don't need to compare anything to every possible word, just check alphabetically to see if it is there period. Right?
Get up every morning and take a look at the Forbes' list of richest people. If your name doesn't appear.... GET TO WORK !!!
![]() |
Similar Threads
- FT Junior Java Developer Needed for Major Media firm in NYC (Software Development Job Offers)
- Java/J2EE Senior Software Engineer (Software Development Job Offers)
- Front-end Java Software Engineer for Digital Video/Media Market Leader (Software Development Job Offers)
- Senior Software Engineer (Java) (Web Development Job Offers)
- Java Software Engineer (Software Development Job Offers)
- Java Front-end Developer Engineer for Stealth Media Start-up (Software Development Job Offers)
- Java Developer Required (Software Development Job Offers)
Other Threads in the Java Forum
- Previous Thread: Need help with my Java Programming
- Next Thread: Help on arrays!! Progress Made!!
| Thread Tools | Search this Thread |
Tag cloud for Java
addressbook android api apple applet application arguments array arrays automation binary bluetooth button calculator chat class classes client code columns component converter database draw eclipse error errors event exception fractal ftp game givemetehcodez graphics gridlayout gui helpwithhomework html ide image inetaddress input integer j2me japplet java javaprojects jme jmf jni jpanel jtextarea julia link linux list loop map method methods midlethttpconnection mobile netbeans newbie number objects openjavafx oracle php print problem program programming project projects recursion rim scanner screen server set signing size smart sms socket sort sql storm string support swing test threads time tree unlimited variablebinding webservices windows






