This program uses Python module re for splitting a text file into words and removing some common punctuation marks. The word:frequency dictionary is then formed using try/except. In honor of 4th of July the text analyzed is National Anthem of USA (found via Google).

# another word frequency program, uses re
# tested with Python2.4.3   HAB

import re

# this one in honor of 4th July, or pick text file you have!!!!!!!
filename = 'NationalAnthemUSA.txt'

# create list of lower case words, \s+ --> match any whitespace(s)
# you can replace file(filename).read() with given string
word_list = re.split('\s+', file(filename).read().lower())
print 'Words in text:', len(word_list)

# create dictionary of word:frequency pairs
freq_dic = {}
# punctuation marks to be removed
punctuation = re.compile(r'[.?!,":;]') 
for word in word_list:
    # remove punctuation marks
    word = punctuation.sub("", word)
    # form dictionary
        freq_dic[word] += 1
        freq_dic[word] = 1

print 'Unique words:', len(freq_dic)

# create list of (key, val) tuple pairs
freq_list = freq_dic.items()
# sort by key or word
# display result
for word, freq in freq_list:
    print word, freq

How would you take this and organize the words that appear in descending order not alphabetical.

Do you mean highest frequency first?

Simply add this to the end of the code:

print '-'*30

print "sorted by highest frequency first:"
# create list of (val, key) tuple pairs
freq_list2 = [(val, key) for key, val in freq_dic.items()]
# sort by val or frequency
# display result
for freq, word in freq_list2:
    print word, freq

Edited 7 Years Ago by bumsfeld: n/a

line 21 onwards:

# form dictionary
         freq_dic[word] += 1
         freq_dic[word] = 1

Could be replaced by:

freq_dic[word] = freq_dic.get(word,0) + 1

gets rid of the try except and just makes things a little neater.

what can I add to this code to remove some words listed in some other file prior doing the frequency listing?

This doesn't seem to remove any punctuation marks from the text file, and reads 'it' separately from 'it,'.

What might the problem be?

The article starter has earned a lot of community kudos, and such articles offer a bounty for quality replies.