944,198 Members | Top Members by Rank

Ad:
  • Python Code Snippet
  • Views: 14419
  • Python RSS
1

Word Frequency using Python

by on Jul 4th, 2006
This program uses Python module re for splitting a text file into words and removing some common punctuation marks. The word:frequency dictionary is then formed using try/except. In honor of 4th of July the text analyzed is National Anthem of USA (found via Google).
Python Code Snippet (Toggle Plain Text)
  1. # another word frequency program, uses re
  2. # tested with Python2.4.3 HAB
  3.  
  4. import re
  5.  
  6. # this one in honor of 4th July, or pick text file you have!!!!!!!
  7. filename = 'NationalAnthemUSA.txt'
  8.  
  9. # create list of lower case words, \s+ --> match any whitespace(s)
  10. # you can replace file(filename).read() with given string
  11. word_list = re.split('\s+', file(filename).read().lower())
  12. print 'Words in text:', len(word_list)
  13.  
  14. # create dictionary of word:frequency pairs
  15. freq_dic = {}
  16. # punctuation marks to be removed
  17. punctuation = re.compile(r'[.?!,":;]')
  18. for word in word_list:
  19. # remove punctuation marks
  20. word = punctuation.sub("", word)
  21. # form dictionary
  22. try:
  23. freq_dic[word] += 1
  24. except:
  25. freq_dic[word] = 1
  26.  
  27.  
  28. print 'Unique words:', len(freq_dic)
  29.  
  30. # create list of (key, val) tuple pairs
  31. freq_list = freq_dic.items()
  32. # sort by key or word
  33. freq_list.sort()
  34. # display result
  35. for word, freq in freq_list:
  36. print word, freq
Comments on this Code Snippet
Oct 12th, 2009
0

Re: Word Frequency using Python

How would you take this and organize the words that appear in descending order not alphabetical.
Newbie Poster
kenmeck03 is offline Offline
5 posts
since Oct 2009
Oct 12th, 2009
0

Re: Word Frequency using Python

Do you mean highest frequency first?

Simply add this to the end of the code:
python Syntax (Toggle Plain Text)
  1. print '-'*30
  2.  
  3. print "sorted by highest frequency first:"
  4. # create list of (val, key) tuple pairs
  5. freq_list2 = [(val, key) for key, val in freq_dic.items()]
  6. # sort by val or frequency
  7. freq_list2.sort(reverse=True)
  8. # display result
  9. for freq, word in freq_list2:
  10. print word, freq
Last edited by bumsfeld; Oct 12th, 2009 at 4:06 pm.
Nearly a Posting Virtuoso
bumsfeld is offline Offline
1,422 posts
since Jul 2005
Oct 19th, 2009
0

Re: Word Frequency using Python

this code is useful...
Newbie Poster
bipratikgoswami is offline Offline
2 posts
since Aug 2009
Oct 19th, 2009
0

Re: Word Frequency using Python

nice piece of code useful indeed
Posting Whiz in Training
masterofpuppets is offline Offline
272 posts
since Jul 2009
Nov 4th, 2009
0

Re: Word Frequency using Python

line 21 onwards:
python Syntax (Toggle Plain Text)
  1. # form dictionary
  2. try:
  3. freq_dic[word] += 1
  4. except:
  5. freq_dic[word] = 1
Could be replaced by:
python Syntax (Toggle Plain Text)
  1. freq_dic[word] = freq_dic.get(word,0) + 1
gets rid of the try except and just makes things a little neater.
Newbie Poster
mattp23 is offline Offline
1 posts
since Nov 2009
Nov 1st, 2010
0

Re: Word Frequency using Python

what can I add to this code to remove some words listed in some other file prior doing the frequency listing?
Newbie Poster
nawaf_ali is offline Offline
3 posts
since Oct 2010
Nov 24th, 2010
0

Re: Word Frequency using Python

This doesn't seem to remove any punctuation marks from the text file, and reads 'it' separately from 'it,'.

What might the problem be?
Newbie Poster
luisbeta04 is offline Offline
1 posts
since Nov 2010
Message:
Previous Thread in Python Forum Timeline: How to pick objects from list
Next Thread in Python Forum Timeline: Help with a hw problem





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC