943,590 Members | Top Members by Rank

Ad:
  • Java Discussion Thread
  • Unsolved
  • Views: 2378
  • Java RSS
You are currently viewing page 1 of this multi-page discussion thread
May 10th, 2009
0

java unscrambler

Expand Post »
Have written an unscrambler in java. This takes in a scrambled word for input and returns the unscrambled word. The way it works is that it creates all permutations of the provided word (I have borrowed the permutation generating code from daniweb i guess) and compares each word to an array formed by parsing a dictionary and reports all matches.
Please find attached the zip file(Unscrambler) containing all files. One problem that this one has is the fact that it tries to dump all permutations in an array, which results to an out of memory error when the word is sufficiently large. Any suggestions to overcome this problem would be appreciated.

Compilation process :
javac ParseWord.java StringPerm.java
javac -classpath . Main.java
java Main <scrambled word>
eg... java Main emit
emit
item
mite
time
Attached Files
File Type: zip unscrambler.zip (128.7 KB, 219 views)
Reputation Points: 10
Solved Threads: 0
Newbie Poster
giri.pankaj is offline Offline
12 posts
since May 2009
May 10th, 2009
1

Re: java unscrambler

Firstly I guess you need not dump all generated permutations into an array. You should only dump those permutations which generate a word out of the unscrambling of the original word, which you can figure out by comparing each genarated word with the list of all dictionary words that you have gathered.

Searching each generated word with a long list of existing words (which might be in thousands) is itself a time consuming task, but you can have the dictionary words put into an ArrayList and then compare each generated word using the contains() method of that class, so I guess this could be much faster than iterating over the words.

Here in this algorithm another very important decision is to generate as few unscrambled words as possible. So in this sense you should try to able to discard a combination of the letters, which is not going to form a word, as early as possible. For eg. in the example you've given of 'time', you should be able to detect that there are no words of length four starting with 'tm' and hence shouldn't move further into generating the permutations starting with this combination. The more longer the word the earlier the better. So if suppose you have been given a scrambled combination of the letters of the word 'difficult' and you come to know that there are no nine lettered words starting with the two lettered combination 'df', you can eliminate the generation of all the arrangements of the rest of the seven letters where the word starts with 'df' then you would be able to save yourself the generation time of 5040 (7!) such words, which is a huge advantage. So work on these lines, and keep posting your enhancements this looks to be a promising task.
Reputation Points: 485
Solved Threads: 89
Posting Shark
verruckt24 is offline Offline
944 posts
since Nov 2008
May 10th, 2009
0

Re: java unscrambler

I disagree, no matter how many enhancements you make, going down the permutation road isn't going to yeild satisfactory results.

Think letter frequency comparisons.

http://www.daniweb.com/forums/post212129-2.html
Featured Poster
Reputation Points: 1536
Solved Threads: 431
Posting Expert
iamthwee is offline Offline
5,865 posts
since Aug 2005
May 10th, 2009
0

Re: java unscrambler

Alright that advice comes from someone who has done this program first hand, whereas I haven't had such experience, the optimizations I suggested were some of several that came to my mind upon an initial read of your problem statement, so you know who to bet your money on.
Reputation Points: 485
Solved Threads: 89
Posting Shark
verruckt24 is offline Offline
944 posts
since Nov 2008
May 10th, 2009
1

Re: java unscrambler

@OP ignore this, I am asking a question relevant to this thread rather than giving you advice here. .

Searching each generated word with a long list of existing words (which might be in thousands) is itself a time consuming task, but you can have the dictionary words put into an ArrayList and then compare each generated word using the contains() method of that class, so I guess this could be much faster than iterating over the words.

Don't you have a dictionary stored in alphabetical order? You don't need to compare anything to every possible word, just check alphabetically to see if it is there period. Right?
Last edited by BestJewSinceJC; May 10th, 2009 at 4:50 pm.
Reputation Points: 874
Solved Threads: 352
Posting Maven
BestJewSinceJC is offline Offline
2,758 posts
since Sep 2008
May 10th, 2009
0

Re: java unscrambler

For sure, don't generate every permutation. I am wondering if a hash function could be useful for this. Don't generate any permutations. Get a good hash function for the scrambled words. A hash function that could certainly be improved upon would be this: a= 0, z = 25, add the values of the letters, so

tae = 19 + 0 + 4 = 23

eat, ate, and tea also have this hash function. So when you calculate the hash (23 in this case), have a map where 23 is the key and it maps to the set of all words that also hash to 23. Then see if any of them have the same letters.

A word like "lie" also maps to 23, so the hash function can definitely be improved. The point is that there won't be the whole dictionary to search, just a much smaller set. It'll take a significant time to set up the map, but it only needs to be done once and after that, searches will be extremely fast, so this method is best for a program where you aren't generating the entire map every time you search for a word. If you can figure out a hash function where ONLY words with the same letters map to the same value, that's perfect. There will be nothing left to check.

Regardless, I agree with BestJewSinceJC; take advantage of searching an ORDERED (i.e. alphabetized) list and cut your search time down from O (n) to O (log n).
Last edited by VernonDozier; May 10th, 2009 at 8:10 pm.
Featured Poster
Reputation Points: 2614
Solved Threads: 687
Posting Expert
VernonDozier is offline Offline
5,372 posts
since Jan 2008
May 11th, 2009
0

Re: java unscrambler

Well, the dictionary words are in a sorted order and I do a binary search for each word generated by the permutation, instead of a linear search. This has resulted in a significant improvement in the search times. The actual problem lies with the permutation generation algorithm which craps out with an "Out Of Memory Error" if a large word, say with 10 letters is passed in. I had initially considered comparison to be performed for each permutation generated, but this would result to an overhead of either a function call or an object interaction for every generated permutation. I guess the best way could be generate permutations and perform comparisons to collective chunks... say for eg, generate 10000 permutations and let the comparison function consume the generated permutations, after it is done, notify the permutation generator to generate the next 10000 permutations and so on. Please let me know any other suggestions.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
giri.pankaj is offline Offline
12 posts
since May 2009
May 11th, 2009
0

Re: java unscrambler

Vernon, your idea seems very good. Please correct me if I am wrong.
This would save me from generating any permutations at all.
Just need to have a hashmap with the simple hashing function and extract all words from the dictionary with same letters and length in the same bucket. Now, when I get the scrambled input, i just need to get the hash of the scrambled word, figure out which bucket the word falls in, and print out all the words appearing in that bucket. Have I understood u correctly?
Reputation Points: 10
Solved Threads: 0
Newbie Poster
giri.pankaj is offline Offline
12 posts
since May 2009
May 11th, 2009
0

Re: java unscrambler

Not sure how quick Vernon's method would be but I know mine to be pretty fast.

Basically, you sort your scrambled word into alphabetical order.

For example "lorac" would be "aclor". We then search our dictionary for any five-letter words that also sorted to the pattern "aclor". (I used a list of about 100,000 words and found four solutions: "calor", "carlo", "carol", and "coral"). This is essentially, the same as using a letter frequency algo, like the one I posted. Which you can test by using (if you didn't know already):

Java Syntax (Toggle Plain Text)
  1. java -jar Yay.jar

You can shave off time if you group the dictionary by order of word length.

So, if you want to find a five letter scrambled word your program would just search for all the words that a five letters long and it would skip all the rest.

This algorithm has no space complexity issues. In other words, it would take no extra time to unscramble a 9 letter word as opposed to the 3 letter word.
Last edited by iamthwee; May 11th, 2009 at 8:15 am.
Featured Poster
Reputation Points: 1536
Solved Threads: 431
Posting Expert
iamthwee is offline Offline
5,865 posts
since Aug 2005
May 11th, 2009
0

Re: java unscrambler

Quote ...
Don't you have a dictionary stored in alphabetical order? You don't need to compare anything to every possible word, just check alphabetically to see if it is there period. Right?
But even then you are comparing the generated word with some words having the starting letters same as it has, this certainly takes up time more so if you are comparing a common start for e.g. 'co' there might be so many words in th dictionary count,common,complex,complexity,compare,comparing,complexion and so on, what then you are still making comparisions which are necessarily a waste of time. The contains method will be a sure way of being fast, with all left upto the implementation of the Collection class, which certainly is much faster.
Reputation Points: 485
Solved Threads: 89
Posting Shark
verruckt24 is offline Offline
944 posts
since Nov 2008

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Java Forum Timeline: Need help with my Java Programming
Next Thread in Java Forum Timeline: Help on arrays!! Progress Made!!





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC