Member Avatar for leegeorg07

Hi a while back you guys helped me make a good spell checker, now i want to advance it so that if the difference between 2 words is small it will change them, I think i need to use diiflib for it but i don't know how to do itmy code is below, along with an attached dictionary file

dict  = open("DictionaryE.txt", "r").readlines()
test= open("README.txt", "r").read()

print dict[0:5]

correct = []
unique = []
test = test.lower()
list_words = test.split(' ')
l = []


for line in list_words:
	if line in dict:
		correct.append(line)

for line in correct:
    l.append(line.strip())

correct = set(l)
		
bleh = str(correct)
open("cheese.txt", "w").write(bleh)

Recommended Answers

All 6 Replies

Member Avatar for leegeorg07

I now have this code but it is taking too long, how could i speed it up/ improve it

import difflib
dict = open("DictionaryE.txt", "r").readlines()
test = "Hello, thsi is a test!"

print dict[0:5]

correct = []
unique = []
test = test.lower()
list_words = test.split(' ')
l = []


for line in list_words:
    if line in dict:
        correct.append(line)
    else:
                for word in dict:
                        line = difflib.get_close_matches(line, dict)

print test
raw_input()

Are you using tabs for indents? They are screwed up!
I normally don't even bother with code that has obvious signs of tabs.
Do not use 'dict' as identifier, it is the name of a Python function.
Also, test your list, the punctuation marks are still attached:

test = "Hello, thsi is a test!"

test = test.lower()
list_words = test.split(' ')

# test the list
print(list_words)

"""
my output (notice that the punctuation marks are left in) -->
['hello,', 'thsi', 'is', 'a', 'test!']
"""
Member Avatar for leegeorg07

oh yeah sorry, i was just using the idle and it does it automatically

got some new code, if it works ill post it but im still not sure that mine is the best method to do this, any improvements will be welcome

The way you have this coded, you are not only calling function difflib.get_close_matches(line, dict) for every word in your test string, but also for every word in the dictionary list. No wonder it is rather slow.

Member Avatar for leegeorg07

yeah I wasnt too sure on how to use it, what would be the best way to use it?

Member Avatar for leegeorg07

I now think that my best option is to use:

for line in list_words:
    if line in dicts:
        correct.append(line)
    elif difflib.SequenceMatcher(None, dicts, line) >= 0.8:
        line =

but im not sure what to put after the

line =

do you have any idea?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.