954,515 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Most Used Words In Text File Code Help

So this is my first post and I have only begun using python. One of my first assignments is to design a program which will count the most used words in the given text file. In my case we are using the Declaration of Independence.

Here is what i have so far, I think everything is fine up until the end were i get confused.The Problems i seem to have is with the My dictionary statements at the bottom. Any way to sort it out?

Once again i'm sorry if i sound terrible but I've only just started this so not everything is 100% accurate.

def word_freq(text_file):
	""" prints the most commonly used words in the given text file
	author

	INPUT
	    text_file: the name of a text file to analyze
	OUTPUT
	    printing the most frequently used words in the file
	"""
	f = open(text_file, 'r')
	contents = f.read()
	words = contents.split()
	for i in range(len(words)):
	    words[i] = words[i].lower()
	    words[i] = words[i].strip(',:.;')
	counter = dict()
	for i in range(len(words)):
	    if words[i] not in counter:
		    counter[words[i]] = 1
	    else:
		    counter[words[i]] += 1
	sorted_words = list(sorted(counter, key=counter.get, reverse=True))
	for w in sorted_words[0:30]:
		    print('freq:',counter[w],'word',w)
		    my_dictionary
		    my_dictionary[‘the’] = 0
	    else:
		    my_dictionary[‘the’] += 1


Any helpful tips or solutions would be greatly appreciated.
Thanks a bunch.

Intrikate
Newbie Poster
2 posts since Sep 2011
Reputation Points: 10
Solved Threads: 0
 

What are you trying to do with 'the' at the end?

pyTony
pyMod
Moderator
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
 

You are making some mistake and it can be done much eaiser with some python power.
You are trying to count with dictionary 2 times 1 is enough.
.strip(',:.;')
You should take out more than this also ?!
An example.

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> s = 'The quick:: brown! fox jumps? over+ the lazy?? dog.'
>>> ''.join(c for c in s if c not in string.punctuation)
'The quick brown fox jumps over the lazy dog'
>>>

So another way this time a complete script.

import re
from collections import Counter

with open('text.txt') as f:
    text = f.read().lower()
words = re.findall(r'\w+', text)

print(Counter(words).most_common(4))

This use regex \w+ that dos the same as i showed in code over remove special character.
And counting is done bye collections Counter new from 2.7-->
Counter also has a most_common function,that dos what name say.
This will show the 4 most common word in text.txt.

snippsat
Practically a Posting Shark
808 posts since Aug 2008
Reputation Points: 353
Solved Threads: 294
 

Longer discussion about various methods of word extraction can be found in my code snippet thread: http://www.daniweb.com/software-development/python/code/321725

pyTony
pyMod
Moderator
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
 


If you want to eliminate certain common words like "the and "and", etc. use a list.

omit_words = ["the", "a", "and", "but", "i"]
    for w in sorted_words[0:30]:
        if w not in omit_words:
woooee
Nearly a Posting Maven
2,454 posts since Dec 2006
Reputation Points: 777
Solved Threads: 714
 

Thanks for all the replies got it working.

Intrikate
Newbie Poster
2 posts since Sep 2011
Reputation Points: 10
Solved Threads: 0
 

CLOSE THAT FILE!!! Also, did you notice the stray else at the end?

pyguy62
Posting Whiz
353 posts since Aug 2011
Reputation Points: 34
Solved Threads: 19
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: