What are you trying to do with 'the' at the end?
pyTony
pyMod
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
You are making some mistake and it can be done much eaiser with some python power.
You are trying to count with dictionary 2 times 1 is enough.
.strip(',:.;')
You should take out more than this also ?!
An example.
>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> s = 'The quick:: brown! fox jumps? over+ the lazy?? dog.'
>>> ''.join(c for c in s if c not in string.punctuation)
'The quick brown fox jumps over the lazy dog'
>>>
So another way this time a complete script.
import re
from collections import Counter
with open('text.txt') as f:
text = f.read().lower()
words = re.findall(r'\w+', text)
print(Counter(words).most_common(4))
This use regex \w+ that dos the same as i showed in code over remove special character.
And counting is done bye collections Counter new from 2.7-->
Counter also has a most_common function,that dos what name say.
This will show the 4 most common word in text.txt.
snippsat
Practically a Posting Shark
808 posts since Aug 2008
Reputation Points: 353
Solved Threads: 294
pyTony
pyMod
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
If you want to eliminate certain common words like "the and "and", etc. use a list.
omit_words = ["the", "a", "and", "but", "i"]
for w in sorted_words[0:30]:
if w not in omit_words:
woooee
Nearly a Posting Maven
2,454 posts since Dec 2006
Reputation Points: 777
Solved Threads: 714