944,030 Members | Top Members by Rank

Ad:
  • Python Code Snippet
  • Views: 11563
  • Python RSS
0

Word Frequency in a Text String (Python)

by on Aug 25th, 2005
This program takes a text string and creates a list of words. Any attached punctuation marks are removed and the words are converted to lower case for sorting. Now you can generate a dictionary of the words and their frequencies. The result is displayed using a sorted list of dictionary keys. A fair amount of comments have been added to aid in the understanding of the program code.
Python Code Snippet (Toggle Plain Text)
  1. # word frequency in a text
  2. # tested with Python24 vegaseat 25aug2005
  3.  
  4. # Chinese wisdom ...
  5. str1 = """Man who run in front of car, get tired.
  6. Man who run behind car, get exhausted."""
  7. print "Original string:"
  8. print str1
  9.  
  10. print
  11.  
  12. # create a list of words separated at whitespaces
  13. wordList1 = str1.split(None)
  14.  
  15. # strip any punctuation marks and build modified word list
  16. # start with an empty list
  17. wordList2 = []
  18. for word1 in wordList1:
  19. # last character of each word
  20. lastchar = word1[-1:]
  21. # use a list of punctuation marks
  22. if lastchar in [",", ".", "!", "?", ";"]:
  23. word2 = word1.rstrip(lastchar)
  24. else:
  25. word2 = word1
  26. # build a wordList of lower case modified words
  27. wordList2.append(word2.lower())
  28.  
  29. print "Word list created from modified string:"
  30. print wordList2
  31.  
  32. print
  33.  
  34. # create a wordfrequency dictionary
  35. # start with an empty dictionary
  36. freqD2 = {}
  37. for word2 in wordList2:
  38. freqD2[word2] = freqD2.get(word2, 0) + 1
  39.  
  40. # create a list of keys and sort the list
  41. # all words are lower case already
  42. keyList = freqD2.keys()
  43. keyList.sort()
  44.  
  45. print "Frequency of each word in the word list (sorted):"
  46. for key2 in keyList:
  47. print "%-10s %d" % (key2, freqD2[key2])
Comments on this Code Snippet
Jun 20th, 2009
0

Re: Word Frequency in a Text String (Python)

The stripping of characters can be better handled this way:

<snippet>
chars = ",.!?;"
word2 = word.rstrip(chars);
</snippet>

This will strip more than one characters in the end if needed.
Newbie Poster
manpreets7 is offline Offline
1 posts
since Jun 2009
Dec 13th, 2009
0

Re: Word Frequency in a Text String (Python)

For building the wordlist, you could just use this:

python Syntax (Toggle Plain Text)
  1. wordlist = str.split(None)
  2. wordlist2 = []
  3. for word in wordlist:
  4. wordlist2.append((word.strip(string.punctuation)).lower())
Newbie Poster
shiftlock is offline Offline
2 posts
since Dec 2009
Jan 10th, 2010
0

Re: Word Frequency in a Text String (Python)

can u give the code script of unix for this program...??
Newbie Poster
ancs is offline Offline
1 posts
since Jan 2010
Jan 11th, 2010
0

Re: Word Frequency in a Text String (Python)

Do you mean a shell script?
Newbie Poster
shiftlock is offline Offline
2 posts
since Dec 2009
Message:
Previous Thread in Python Forum Timeline: Thread Lock etc.
Next Thread in Python Forum Timeline: my fist thread - pygame, global name "glob"





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC