Most Used Words In Text File Code Help

Question

Intrikate 0 Newbie Poster

13 Years Ago

So this is my first post and I have only begun using python. One of my first assignments is to design a program which will count the most used words in the given text file. In my case we are using the Declaration of Independence.

Here is what i have so far, I think everything is fine up until the end were i get confused.The Problems i seem to have is with the My dictionary statements at the bottom. Any way to sort it out?

Once again i'm sorry if i sound terrible but I've only just started this so not everything is 100% accurate.

def word_freq(text_file):
	""" prints the most commonly used words in the given text file
	author

	INPUT
	    text_file: the name of a text file to analyze
	OUTPUT
	    printing the most frequently used words in the file
	"""
	f = open(text_file, 'r')
	contents = f.read()
	words = contents.split()
	for i in range(len(words)):
	    words[i] = words[i].lower()
	    words[i] = words[i].strip(',:.;')
	counter = dict()
	for i in range(len(words)):
	    if words[i] not in counter:
		    counter[words[i]] = 1
	    else:
		    counter[words[i]] += 1
	sorted_words = list(sorted(counter, key=counter.get, reverse=True))
	for w in sorted_words[0:30]:
		    print('freq:',counter[w],'word',w)
		    my_dictionary
		    my_dictionary[‘the’] = 0
	    else:
		    my_dictionary[‘the’] += 1

Any helpful tips or solutions would be greatly appreciated.
Thanks a bunch.

file-system python

5 Contributors
6 Replies
3K Views
2 Days Discussion Span
Latest Post 13 Years Ago Latest Post by JoshuaBurleson

All 6 Replies

snippsat 661 Master Poster

13 Years Ago

You are making some mistake and it can be done much eaiser with some python power.
You are trying to count with dictionary 2 times 1 is enough.
.strip(',:.;')
You should take out more than this also ?!
An example.

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> s = 'The quick:: brown! fox jumps? over+ the lazy?? dog.'
>>> ''.join(c for c in s if c not in string.punctuation)
'The quick brown fox jumps over the lazy dog'
>>>

So another way this time a complete script.

import re
from collections import Counter

with open('text.txt') as f:
    text = f.read().lower()
words = re.findall(r'\w+', text)

print(Counter(words).most_common(4))

This use regex \w+ that dos the same as i showed in code over remove special character.
And counting is done bye collections Counter new from 2.7-->
Counter also has a most_common function,that dos what name say.
This will show the 4 most common word in text.txt.

Edited 13 Years Ago by snippsat because: n/a

woooee 814 Nearly a Posting Maven

13 Years Ago

If you want to eliminate certain common words like "the and "and", etc. use a list.

omit_words = ["the", "a", "and", "but", "i"]
    for w in sorted_words[0:30]:
        if w not in omit_words:

Edited 13 Years Ago by woooee because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 1 · 2011-09-15T12:34:37+00:00

TrustyTony 888 ex-Moderator

13 Years Ago

What are you trying to do with 'the' at the end?

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 2 · 2011-09-15T23:33:02+00:00

Longer discussion about various methods of word extraction can be found in my code snippet thread: http://www.daniweb.com/software-development/python/code/321725

Intrikate 0 Newbie Poster · Answer 3 · 2011-09-16T22:34:46+00:00

Intrikate 0 Newbie Poster

13 Years Ago

Thanks for all the replies got it working.

JoshuaBurleson 23 Posting Whiz · Answer 4 · 2011-09-17T10:13:47+00:00

CLOSE THAT FILE!!! Also, did you notice the stray else at the end?

Most Used Words In Text File Code Help

Recommended Answers Collapse Answers

All 6 Replies

Recommended Answers