So this is my first post and I have only begun using python. One of my first assignments is to design a program which will count the most used words in the given text file. In my case we are using the Declaration of Independence.

Here is what i have so far, I think everything is fine up until the end were i get confused.The Problems i seem to have is with the My dictionary statements at the bottom. Any way to sort it out?

Once again i'm sorry if i sound terrible but I've only just started this so not everything is 100% accurate.

def word_freq(text_file):
	""" prints the most commonly used words in the given text file

	    text_file: the name of a text file to analyze
	    printing the most frequently used words in the file
	f = open(text_file, 'r')
	contents = f.read()
	words = contents.split()
	for i in range(len(words)):
	    words[i] = words[i].lower()
	    words[i] = words[i].strip(',:.;')
	counter = dict()
	for i in range(len(words)):
	    if words[i] not in counter:
		    counter[words[i]] = 1
		    counter[words[i]] += 1
	sorted_words = list(sorted(counter, key=counter.get, reverse=True))
	for w in sorted_words[0:30]:
		    my_dictionary[‘the’] = 0
		    my_dictionary[‘the’] += 1

Any helpful tips or solutions would be greatly appreciated.
Thanks a bunch.

5 Years
Discussion Span
Last Post by pyguy62

You are making some mistake and it can be done much eaiser with some python power.
You are trying to count with dictionary 2 times 1 is enough.
You should take out more than this also ?!
An example.

>>> import string
>>> string.punctuation
>>> s = 'The quick:: brown! fox jumps? over+ the lazy?? dog.'
>>> ''.join(c for c in s if c not in string.punctuation)
'The quick brown fox jumps over the lazy dog'

So another way this time a complete script.

import re
from collections import Counter

with open('text.txt') as f:
    text = f.read().lower()
words = re.findall(r'\w+', text)


This use regex \w+ that dos the same as i showed in code over remove special character.
And counting is done bye collections Counter new from 2.7-->
Counter also has a most_common function,that dos what name say.
This will show the 4 most common word in text.txt.

Edited by snippsat: n/a


If you want to eliminate certain common words like "the and "and", etc. use a list.

omit_words = ["the", "a", "and", "but", "i"]
    for w in sorted_words[0:30]:
        if w not in omit_words:

Edited by woooee: n/a

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.