So this is my first post and I have only begun using python. One of my first assignments is to design a program which will count the most used words in the given text file. In my case we are using the Declaration of Independence.

Here is what i have so far, I think everything is fine up until the end were i get confused.The Problems i seem to have is with the My dictionary statements at the bottom. Any way to sort it out?

Once again i'm sorry if i sound terrible but I've only just started this so not everything is 100% accurate.

def word_freq(text_file):
	""" prints the most commonly used words in the given text file

	    text_file: the name of a text file to analyze
	    printing the most frequently used words in the file
	f = open(text_file, 'r')
	contents =
	words = contents.split()
	for i in range(len(words)):
	    words[i] = words[i].lower()
	    words[i] = words[i].strip(',:.;')
	counter = dict()
	for i in range(len(words)):
	    if words[i] not in counter:
		    counter[words[i]] = 1
		    counter[words[i]] += 1
	sorted_words = list(sorted(counter, key=counter.get, reverse=True))
	for w in sorted_words[0:30]:
		    my_dictionary[‘the’] = 0
		    my_dictionary[‘the’] += 1

Any helpful tips or solutions would be greatly appreciated.
Thanks a bunch.

Recommended Answers

All 6 Replies

What are you trying to do with 'the' at the end?

You are making some mistake and it can be done much eaiser with some python power.
You are trying to count with dictionary 2 times 1 is enough.
You should take out more than this also ?!
An example.

>>> import string
>>> string.punctuation
>>> s = 'The quick:: brown! fox jumps? over+ the lazy?? dog.'
>>> ''.join(c for c in s if c not in string.punctuation)
'The quick brown fox jumps over the lazy dog'

So another way this time a complete script.

import re
from collections import Counter

with open('text.txt') as f:
    text =
words = re.findall(r'\w+', text)


This use regex \w+ that dos the same as i showed in code over remove special character.
And counting is done bye collections Counter new from 2.7-->
Counter also has a most_common function,that dos what name say.
This will show the 4 most common word in text.txt.

If you want to eliminate certain common words like "the and "and", etc. use a list.

omit_words = ["the", "a", "and", "but", "i"]
    for w in sorted_words[0:30]:
        if w not in omit_words:

Thanks for all the replies got it working.

CLOSE THAT FILE!!! Also, did you notice the stray else at the end?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, learning, and sharing knowledge.