Word Frequency in a Text String (Python)

vegaseat vegaseat is offline Offline Aug 25th, 2005, 4:13 pm |
0
This program takes a text string and creates a list of words. Any attached punctuation marks are removed and the words are converted to lower case for sorting. Now you can generate a dictionary of the words and their frequencies. The result is displayed using a sorted list of dictionary keys. A fair amount of comments have been added to aid in the understanding of the program code.
Quick reply to this message  
Python Syntax
  1. # word frequency in a text
  2. # tested with Python24 vegaseat 25aug2005
  3.  
  4. # Chinese wisdom ...
  5. str1 = """Man who run in front of car, get tired.
  6. Man who run behind car, get exhausted."""
  7. print "Original string:"
  8. print str1
  9.  
  10. print
  11.  
  12. # create a list of words separated at whitespaces
  13. wordList1 = str1.split(None)
  14.  
  15. # strip any punctuation marks and build modified word list
  16. # start with an empty list
  17. wordList2 = []
  18. for word1 in wordList1:
  19. # last character of each word
  20. lastchar = word1[-1:]
  21. # use a list of punctuation marks
  22. if lastchar in [",", ".", "!", "?", ";"]:
  23. word2 = word1.rstrip(lastchar)
  24. else:
  25. word2 = word1
  26. # build a wordList of lower case modified words
  27. wordList2.append(word2.lower())
  28.  
  29. print "Word list created from modified string:"
  30. print wordList2
  31.  
  32. print
  33.  
  34. # create a wordfrequency dictionary
  35. # start with an empty dictionary
  36. freqD2 = {}
  37. for word2 in wordList2:
  38. freqD2[word2] = freqD2.get(word2, 0) + 1
  39.  
  40. # create a list of keys and sort the list
  41. # all words are lower case already
  42. keyList = freqD2.keys()
  43. keyList.sort()
  44.  
  45. print "Frequency of each word in the word list (sorted):"
  46. for key2 in keyList:
  47. print "%-10s %d" % (key2, freqD2[key2])
0
manpreets7 manpreets7 is offline Offline | Jun 20th, 2009
The stripping of characters can be better handled this way:

<snippet>
chars = ",.!?;"
word2 = word.rstrip(chars);
</snippet>

This will strip more than one characters in the end if needed.
 
 

Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC