Word Frequency Count Revisited (Python)

Updated vegaseat 2 Tallied Votes 654 Views Share

This short snippet shows you how to preprocess a text to remove punctuation marks, then do a word frequency count
using Counter() from the Python module collections. Finally, two sorts are applied to display words with matching
frequency in alphabetical order.

''' Count_words101.py
experiments with string processing
preprocess the string then do a word frequency count
using Counter() from the Python module collections
display words with matching frequency in alphabetical order
tested with Python27 and Python33  by  vegaseat  06sep2013
'''

from string import punctuation
from collections import Counter

# sample text for testing (could come from a text file)
text = """\
If you see a turn signal blinking on a car with a southern license plate, 
you may rest assured that it was on when the car was purchased.
""" 

# preprocess text, remove punctuation marks and change to lower case    
text2 = ''.join(c for c in text.lower() if c not in punctuation)

# get word and frequency of the words in text2
# text2.split() splits text2 at white spaces
# most_common() gives a list of all (word, freq) tuples sorted by count
# eg. most_common(10) will return a list of the 10 most common words
mc_cnt = Counter(text2.split()).most_common()

print("Words with matching frequeny have no particular order:")
for word, freq in mc_cnt:
    # newer string formatting style Python27 and higher
    print("{:3d}  {}".format(freq, word))

# clean it up, showing words with matching freq in order 
# sort by word first then sort by frequency
mc_cnt_w = sorted(mc_cnt)
mc_cnt_fw = sorted(mc_cnt_w, key=lambda tup: tup[1], reverse=True)

'''
# optional testing ...
print(mc_cnt)
print(mc_cnt_w)
print(mc_cnt_fw)
'''

print("Words with matching frequency are in order:")
for word, freq in mc_cnt_fw:
    # newer string formatting style Python27 and higher
    print("{:3d}  {}".format(freq, word))

''' result ...
Words with matching frequeny have no particular order:
  3  a
  2  you
  2  on
  2  car
  2  was
  1  southern
  1  with
  1  see
  1  if
  1  plate
  1  may
  1  assured
  1  when
  1  it
  1  blinking
  1  purchased
  1  license
  1  the
  1  signal
  1  rest
  1  that
  1  turn
Words with matching frequency are in order:
  3  a
  2  car
  2  on
  2  was
  2  you
  1  assured
  1  blinking
  1  if
  1  it
  1  license
  1  may
  1  plate
  1  purchased
  1  rest
  1  see
  1  signal
  1  southern
  1  that
  1  the
  1  turn
  1  when
  1  with
'''
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.