Dear All,

I am a total newbie to Python and programming in general. I know I'd find more materials for Python2, but Python3 was a reflected choice.

That said, I have gone trough:

http://www.daniweb.com/forums/thread173960-2.html

and tried to assemble my spell checker, and ended up with the following code:

#!/usr/bin/python3
# Filename: spellcheck2.py

correct = []
unknown = []

dict_file = open("DictionaryE.txt", "r").readlines()

for i in range(len(dict_file)):
    dict_file[i] = dict_file[i][0:len(dict_file[i])-2] #eliminate \n, line characters in the dictionary

input_text = open("text.txt", "r").read()
input_text = input_text.lower() #avoid problems with CAPS

list_words = input_text.split(' ')

print(list_words)

for word in list_words:
    if word in dict_file:
        correct.append(word)
    else:
        unknown.append(word)

print()
print("Correct words are: ")
print()
for x in range(len(correct)):
    print(x+1, '\t', correct[x])
 
print()
print("Unknown words are: ")
print()
for z in range(len(unknown)):
    print(z+1, '\t', unknown[z])

Please find attached the files with the dictionary and the sample text.

I really cannot understand why some words that certainly are in the dictionary (like "very" and "among") end up in the unknown words list. Any help would be welcome.

As a secondary issue, I couldn't figure out how to use multiple separators (in addition to space, also have punctuation) with the "split" command for lists.

Thanks in advance for any help.


Yeti

Maybe beter?

for i in range(len(dict_file)):
    #dict_file[i] = dict_file[i][0:len(dict_file[i])-2] #eliminate \n, line characters in the dictionary
    dict_file[i] = dict_file[i].strip()
list_words = input_text.strip().split(' ')
Unknown words are: 
()
(1, '\t', 'general-purpose')
(2, '\t', 'high-level')

Python 2.5 Linux

Thank you so much -ordi- (also for the speed in the reply!!).

That indeed did the trick. I will go through documentation on "strip" to try to get a better hand of it. I presume that the problem was with strange characters in between the sample words, right?

Cheers,


Yeti

Thank you so much -ordi- (also for the speed in the reply!!).

That indeed did the trick. I will go through documentation on "strip" to try to get a better hand of it. I presume that the problem was with strange characters in between the sample words, right?

Cheers,


Yeti

Yeah, strip() removes that.

It's not good way:

input_text = input_text.replace('-', ' ')

list_words = input_text.strip().split()
Unknown words are: 
()

Maybe here -> http://reliablybroken.com/b/2010/04/split-a-file-on-any-character-in-python/ it's better.

or http://www.daniweb.com/forums/thread338875.html

Well, I actually found your solution quite intelligent.

However, only to avoid several different lines of "replace" (one for each punctuation character), I used "string.punctuation" like this:

import string

for i in string.punctuation:
    input_text = input_text.replace(i, ' ')

This is certainly not the best way to do it, but at least I understand it and, for the moment, that is better than simply copying this stuff:

http://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string-in-python

Thanks once again -ordi-