DaniWeb IT Discussion Community

DaniWeb IT Discussion Community (http://www.daniweb.com/forums/index.php)
-   Python (http://www.daniweb.com/forums/forum114.html)
-   -   Code Snippet: Word Frequency in a Text String (Python) (http://www.daniweb.com/forums/thread216616.html)

vegaseat Aug 25th, 2005 4:13 pm
Word Frequency in a Text String (Python)
 
This program takes a text string and creates a list of words. Any attached punctuation marks are removed and the words are converted to lower case for sorting. Now you can generate a dictionary of the words and their frequencies. The result is displayed using a sorted list of dictionary keys. A fair amount of comments have been added to aid in the understanding of the program code.

  1. # word frequency in a text
  2. # tested with Python24 vegaseat 25aug2005
  3.  
  4. # Chinese wisdom ...
  5. str1 = """Man who run in front of car, get tired.
  6. Man who run behind car, get exhausted."""
  7. print "Original string:"
  8. print str1
  9.  
  10. print
  11.  
  12. # create a list of words separated at whitespaces
  13. wordList1 = str1.split(None)
  14.  
  15. # strip any punctuation marks and build modified word list
  16. # start with an empty list
  17. wordList2 = []
  18. for word1 in wordList1:
  19. # last character of each word
  20. lastchar = word1[-1:]
  21. # use a list of punctuation marks
  22. if lastchar in [",", ".", "!", "?", ";"]:
  23. word2 = word1.rstrip(lastchar)
  24. else:
  25. word2 = word1
  26. # build a wordList of lower case modified words
  27. wordList2.append(word2.lower())
  28.  
  29. print "Word list created from modified string:"
  30. print wordList2
  31.  
  32. print
  33.  
  34. # create a wordfrequency dictionary
  35. # start with an empty dictionary
  36. freqD2 = {}
  37. for word2 in wordList2:
  38. freqD2[word2] = freqD2.get(word2, 0) + 1
  39.  
  40. # create a list of keys and sort the list
  41. # all words are lower case already
  42. keyList = freqD2.keys()
  43. keyList.sort()
  44.  
  45. print "Frequency of each word in the word list (sorted):"
  46. for key2 in keyList:
  47. print "%-10s %d" % (key2, freqD2[key2])
manpreets7 Jun 20th, 2009 4:47 am
The stripping of characters can be better handled this way:

<snippet>
chars = ",.!?;"
word2 = word.rstrip(chars);
</snippet>

This will strip more than one characters in the end if needed.


All times are GMT -4. The time now is 11:56 pm.

Forum system based on vBulletin Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
©2003 - 2009 DaniWeb® LLC