| | |
Word Frequency in a Text String (Python)
This program takes a text string and creates a list of words. Any attached punctuation marks are removed and the words are converted to lower case for sorting. Now you can generate a dictionary of the words and their frequencies. The result is displayed using a sorted list of dictionary keys. A fair amount of comments have been added to aid in the understanding of the program code.
# word frequency in a text # tested with Python24 vegaseat 25aug2005 # Chinese wisdom ... str1 = """Man who run in front of car, get tired. Man who run behind car, get exhausted.""" print "Original string:" print str1 # create a list of words separated at whitespaces wordList1 = str1.split(None) # strip any punctuation marks and build modified word list # start with an empty list wordList2 = [] for word1 in wordList1: # last character of each word lastchar = word1[-1:] # use a list of punctuation marks if lastchar in [",", ".", "!", "?", ";"]: word2 = word1.rstrip(lastchar) else: word2 = word1 # build a wordList of lower case modified words wordList2.append(word2.lower()) print "Word list created from modified string:" print wordList2 # create a wordfrequency dictionary # start with an empty dictionary freqD2 = {} for word2 in wordList2: freqD2[word2] = freqD2.get(word2, 0) + 1 # create a list of keys and sort the list # all words are lower case already keyList = freqD2.keys() keyList.sort() print "Frequency of each word in the word list (sorted):" for key2 in keyList: print "%-10s %d" % (key2, freqD2[key2])
Similar Threads
- How to search a line word from text file in Python? (Python)
- Top word frequency counter help (C)
- Word Frequency Counter Help (Java)
- Word Frequency Count (Java)
| Thread Tools | Search this Thread |
abrupt alarm ansi anti approximation assignment avogadro backend beginner binary bluetooth calculator character cmd code customdialog cx-freeze data decimals dictionaries dictionary directory dynamic error examples exe file float format function gnu graphics gui halp heads homework http ideas import input java launcher leftmouse line linux list lists loop module mouse number numbers output parsing path pointer port prime programming progressbar projects push py2exe pygame pyglet pyqt python random recursion schedule screensaverloopinactive script scrolledtext sqlite statistics string strings sudokusolver sum table terminal text thread threading time tlapse tricks tuple tutorial twoup ubuntu unicode urllib urllib2 variable ventrilo wikipedia write wxpython xlib



