| | |
Word Frequency in a Text String (Python)
This program takes a text string and creates a list of words. Any attached punctuation marks are removed and the words are converted to lower case for sorting. Now you can generate a dictionary of the words and their frequencies. The result is displayed using a sorted list of dictionary keys. A fair amount of comments have been added to aid in the understanding of the program code.
# word frequency in a text # tested with Python24 vegaseat 25aug2005 # Chinese wisdom ... str1 = """Man who run in front of car, get tired. Man who run behind car, get exhausted.""" print "Original string:" print str1 # create a list of words separated at whitespaces wordList1 = str1.split(None) # strip any punctuation marks and build modified word list # start with an empty list wordList2 = [] for word1 in wordList1: # last character of each word lastchar = word1[-1:] # use a list of punctuation marks if lastchar in [",", ".", "!", "?", ";"]: word2 = word1.rstrip(lastchar) else: word2 = word1 # build a wordList of lower case modified words wordList2.append(word2.lower()) print "Word list created from modified string:" print wordList2 # create a wordfrequency dictionary # start with an empty dictionary freqD2 = {} for word2 in wordList2: freqD2[word2] = freqD2.get(word2, 0) + 1 # create a list of keys and sort the list # all words are lower case already keyList = freqD2.keys() keyList.sort() print "Frequency of each word in the word list (sorted):" for key2 in keyList: print "%-10s %d" % (key2, freqD2[key2])
Similar Threads
- How to search a line word from text file in Python? (Python)
- Top word frequency counter help (C)
- Word Frequency Counter Help (Java)
- Word Frequency Count (Java)
| Thread Tools | Search this Thread |
alarm ansi anydbm app assignment backend beginner binary bluetooth character cipher cmd coordinates customdialog cx-freeze data decimals development directory dynamic exe feet file float format function generator getvalue gnu graphics halp handling heads homework http ideas input ip itunes java keycontrol leftmouse line linux list lists loop maintain maze millimeter module mouse number numbers output parsing path pointer prime programming progressbar push py2exe pygame pymailer python queue random recursion recursive schedule screensaverloopinactive script slicenotation sqlite ssh statistics string strings sudokusolver text thread time tlapse tuple ubuntu unicode url urllib urllib2 variable ventrilo vigenere web webservice wikipedia write wxpython xlib xlwt



