Character Frequency using Python

bumsfeld 0 Tallied Votes 610 Views Share

The program takes text and establishes dictionary of character:frequency. This dictionary is then presented sorted by character. Since it is important to show the most frequent characters, two methods are given to present dictionary sorted by frequency. Should all be very good for learning Python.

# show the frequency of characters in a string
# Python24 and higher   by HAB

text = """Turd floats:

A Charlotte, NC lawyer purchased a box of rare and expensive cigars,
then insured them against fire and theft. Within a month, having smoked
all those great cigars and without having made even his first premium
payment on the policy, the lawyer filed a claim against the insurance
company. In his claim, the lawyer stated the cigars were lost "in a
series of small fires."

The insurance company refused to pay, citing the obvious reason:
that the man had consumed the cigars in the normal fashion. The lawyer
sued and won!"""


# create a character:frequency dictionary
cf_dic = {}
for char in text.lower():
    cf_dic[char] = cf_dic.get(char, 0) + 1

print "Characters sorted by ASCII number:"
# create a sorted list of keys
key_list = sorted(cf_dic.keys())
for key in key_list:
    # don't show space and newline
    if key not in " \n":
        # associate the value with the key
        print "%2s  %3d" % (key, cf_dic[key])

print

print "Characters sorted by frequency:"
# convert cf_dic to list of (k, v) tuples with cf_dic.items()
# flip tuple elements to (v, k) using list comprehension
# then sort list of tuples (order is v,k), highest v first
value_key = sorted([(v, k) for k, v in cf_dic.items()], reverse=True)
for vk in value_key:
    # don't show space and newline
    if vk[1] not in " \n":
        print "%2s  %3d" % (vk[1], vk[0])

print

print "Characters sorted by frequency (method 2):"
# using list of (k, v) tuples, operator.itemgetter() and sorted()
# establish sort key: index of tuple element key = 0, value = 1
import operator
index = operator.itemgetter(1)
# create list of (k, v) tuples from dictionary
key_value1 = cf_dic.items()
# sort list of (k, v) tuples by v, highest v first
key_value2 = sorted(key_value1, key=index, reverse=True)
for kv in key_value2:
    # don't show space and newline
    if kv[0] not in " \n":
        print "%2s  %3d" % (kv[0], kv[1])