Count word frequency using dictionaries

Thread Solved

Join Date: Nov 2009
Posts: 10
Reputation: axa121 is an unknown quantity at this point 
Solved Threads: 0
axa121 axa121 is offline Offline
Newbie Poster

Count word frequency using dictionaries

 
0
  #1
Jan 29th, 2010
  1. from string import *
  2.  
  3. def removePunctuation(sentence):
  4. sentence = lower(sentence)
  5. new_sentence = ""
  6. for char in sentence:
  7. if char not in punctuation:
  8. new_sentence = new_sentence + char
  9.  
  10. return new_sentence
  11.  
  12. def wordFrequences(sentence):
  13. wordCounts = {}
  14. split_sentence = new_sentence.split()
  15. print split_sentence
  16. for entry in split_sentence:
  17. for word in entry:
  18. wordCounts[entry] = wordCounts.get (entry,0) + 1
  19. wordCounts.items()
  20. return wordCounts
  21.  
  22. sentence = "This is a test sentence, to test the function."
  23. new_sentence = removePunctuation(sentence)
  24. wordFrequences(sentence)

Hi I am trying to write a program which calculates how many times a certain word appears in a string.

Could someone help me how to do this, i.e. is there something similar instead of using .get which counts the characters.

At the moment i get the output of:
{'a': 1, 'function': 8, 'sentence': 8, 'this': 4, 'is': 2, 'to': 2, 'test': 8, 'the': 3}

I am trying to get the following:
{'this': 1, 'a': 1, 'is': 1, 'test': 2, ...}
Last edited by axa121; Jan 29th, 2010 at 2:33 pm.
Reply With Quote Quick reply to this message  
Join Date: Nov 2009
Posts: 10
Reputation: axa121 is an unknown quantity at this point 
Solved Threads: 0
axa121 axa121 is offline Offline
Newbie Poster
 
0
  #2
Jan 29th, 2010
Ok no need for help.

I took a break and when I back I got the solution straight away.
Reply With Quote Quick reply to this message  
Join Date: Oct 2009
Posts: 136
Reputation: lrh9 is an unknown quantity at this point 
Solved Threads: 14
lrh9 lrh9 is offline Offline
Junior Poster
 
0
  #3
Jan 29th, 2010
If you find a solution to one of your own problems, it is considered polite to post the solution regardless because other people might also have the same problem.

As for my solution to the problem statement?

I'd probably just strip the string of any non-alphabetic characters excepting spaces and newlines, replace all newlines with spaces, split the resulting string around spaces, iterate over the resulting sequence, and add the word to the dictionary if it is not present with a count of one or increment the counter for the word. (Using the word as the dictionary key. First check that the dictionary has the key for the word, if not then add the key as one or if it does increment the key.)
Reply With Quote Quick reply to this message  
Join Date: Oct 2009
Posts: 136
Reputation: lrh9 is an unknown quantity at this point 
Solved Threads: 14
lrh9 lrh9 is offline Offline
Junior Poster
 
0
  #4
Jan 29th, 2010
Also, it is considered bad code to use the '+' operator to concatenate strings. Strings are immutable objects, so appending a string to another string creates a new string object which takes time and memory.

It is better to store each piece in a list until a concatenated string is needed, then join each piece using a string's "join" method on the list.

  1. string1 = 'This'
  2. string2 = 'is'
  3. string3 = 'worse.'
  4. final_string = string1 + ' ' + string2 + ' ' + string3
  5. """This is worse."""
  6.  
  7. mylist = ['This', 'is', 'better.']
  8. better_string = ' '.join(mylist)
  9. """This is better."""
Reply With Quote Quick reply to this message  
Join Date: Nov 2009
Posts: 10
Reputation: axa121 is an unknown quantity at this point 
Solved Threads: 0
axa121 axa121 is offline Offline
Newbie Poster
 
0
  #5
Jan 30th, 2010
  1. from string import *
  2.  
  3.  
  4. def removePunctuation(sentence):
  5. sentence = lower(sentence)
  6. new_sentence = ""
  7. for char in sentence:
  8. if char not in punctuation:
  9. new_sentence = new_sentence + char
  10.  
  11. return new_sentence
  12.  
  13. def wordFrequences(sentence):
  14. wordFreq = {}
  15. split_sentence = new_sentence.split()
  16. for word in split_sentence:
  17. wordFreq[word] = wordFreq.get(word,0) + 1
  18. wordFreq.items()
  19. print wordFreq
  20.  
  21. sentence = "The first test of the function"
  22. new_sentence = removePunctuation(sentence)
  23. wordFrequences(sentence)

Here is the corrected version.
Reply With Quote Quick reply to this message  
Join Date: Dec 2006
Posts: 1,197
Reputation: woooee is a jewel in the rough woooee is a jewel in the rough woooee is a jewel in the rough woooee is a jewel in the rough 
Solved Threads: 341
woooee woooee is offline Offline
Nearly a Posting Virtuoso
 
0
  #6
Jan 30th, 2010
I think you want to use new_sentence (and is one of the positive results of posting code).
  1. def wordFrequences(new_sentence):
  2. wordFreq = {}
  3.  
  4. ## new_sentence was not defined
  5. split_sentence = new_sentence.split()
  6.  
  7. for word in split_sentence:
  8. wordFreq[word] = wordFreq.get(word,0) + 1
  9. wordFreq.items()
  10. print wordFreq
  11.  
  12. sentence = "The first test of the function"
  13. new_sentence = removePunctuation(sentence)
  14. wordFrequences(new_sentence)
Last edited by woooee; Jan 30th, 2010 at 12:37 pm.
Linux counter #99383
Reply With Quote Quick reply to this message  
Reply

This thread has been marked solved.
Perhaps start a new thread instead?
Message:




Views: 403 | Replies: 5
Thread Tools Search this Thread



Tag cloud for Python
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2010 DaniWeb® LLC