I just made a small program to spell check a provided sentence and point errors in the sentence. Actually, the program creates a list by reading data from text file which contains dictionary words and from there it tells whether the inputted word/s are in dictionary or not. I would like to extend my program further by also adding a suggestion list to suggest user words similar to the incorrect word/s they entered so that they can modify their sentence accordingly. How would i be able to suggest similar words?
Here is the code snippet:-

def check():
    print '*'*8+" Program to check spelling errors in a sentence you entered "+'*'*8
    print "write some text in english"
    text=raw_input("Start: ")
    tex=text.lower()
    print tex
    textcheck=tex.split(' ')
    dic=open('D:\Mee Utkarsh\Code\Python\DictionaryE.txt','r')
    origdic=dic.read()
    origdicf=origdic.split('\n')
    errorlist=[]
    correctwordlist=[]
    for words in textcheck:
        if words in origdicf:
            correctwordlist.append(words)
        elif words not in origdicf:
            errorlist.append(words)
        else:
            pass
    for x in textcheck:
        if x.isdigit():
            correctwordlist.append(x)
            errorlist.remove(x)
    print '-'*50
    print 'Error words list'
    a=1
    while a==1:
        if errorlist==[]:
            print 'No Error!'
            a=a+1
        else:
            for x in errorlist:
                print '\b',x,'  '
            a+=1
    print '-'*50
    y=1
    print 'Correct Words list'
    while y==1:
        if correctwordlist==[]:
            print 'Sentence Full of Errors'
            y=y+1
        else:
            for x in correctwordlist:
                print '\b',x,'  '
            y=y+1

    print '-'*50

Recommended Answers

All 5 Replies

Here's something that I use for similar purposes. (Note: I cannot take credit for the longest_common_sequence function and am too lazy to look up who actually created it, sorry.)

Usage:

>>> find_likeness(what='ardvark', where=['aardvark', 'apple', 'acrobat'], threshold=80.0, case_sensitive=True, return_seq=True, bestonly=False)
[(93.33, 'aardvark', 0, 'ardvark')]

ardvark is a 93.33% match with aardvark, found at index 0, They have the letters 'ardvark' in sequentially in common.

def find_likeness(what, where, threshold=0, case_sensitive=True, return_seq=False, bestonly=False):
   """ generator object; searches thru list and yields closest
   matches first. Returns (match, otherWord, index, seq)"""
   if not case_sensitive:
      what = what.lower()
      where = [x.lower() for x in where]
   del case_sensitive


   word_len = len(what)

   minus1 = word_len - 1

   result = []

   for index, otherWord in enumerate(where):
     two, match = otherWord, 0
     total = len(otherWord)+word_len
     req = total*threshold*.005
     for n, x in enumerate(what):
         if x in two:
            two = two.replace(x, '', 1)
            match += 1
         elif match + (minus1-n) < req:
            break
     else:
         match, seq = longest_common_sequence(what, otherWord)
         if match >= threshold:
            result.append( (match, otherWord, index, seq) )
##            if match > best:
##                best = match

   result.sort(reverse=1)
   if not return_seq:
      result = [x[:-1] for x in result]

   if bestonly:
      if result:
         return result[0]
   else:
      return result




def longest_common_sequence(one, two):
   len_one, len_two = len(one), len(two)
   longestSequence = {}
   if not len_one+len_two: return 100.0, ''

   [longestSequence.__setitem__((two_index,0), [0,'']) for two_index in xrange(len_two+1)]
   [longestSequence.__setitem__((0,two_index), [0,'']) for two_index in xrange(len_one+1)]

   prev_two=0 ## j-1
   for two_index in xrange(1, len_two+1):
      prev_one = 0 ## i-1
      for one_index in xrange(1, len_one+1):
         if one[prev_one] == two[prev_two]:
            longestSequence[two_index, one_index] = (1 + longestSequence[prev_two, prev_one][0]), longestSequence[prev_two, prev_one][1]+one[prev_one]
         else:
            longestSequence[two_index, one_index] = max(longestSequence[prev_two, one_index], longestSequence[two_index, prev_one])
         prev_one = one_index
      prev_two = two_index
   seq_len, seq = longestSequence[len_two, len_one]
##   return round(200.0*seq_len/(len_one+len_two), 2)
   return (round(200.0*seq_len/(len_one+len_two), 2), seq)

..... little similarity? It's copied from your post.

fyi it fails when comparing an empty string

>>> lcs_tuple('', 'fail')

Traceback (most recent call last):
  File "<pyshell#41>", line 1, in <module>
    lcs_tuple('', 'fail')
  File "test.py", line 19, in lcs_tuple
    return round(200.0*this[0]/(n1+n2),2),this[1]
UnboundLocalError: local variable 'this' referenced before assignment
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.