| | |
Word Frequency using Python
Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
This program uses Python module re for splitting a text file into words and removing some common punctuation marks. The word:frequency dictionary is then formed using try/except. In honor of 4th of July the text analyzed is National Anthem of USA (found via Google).
# another word frequency program, uses re # tested with Python2.4.3 HAB import re # this one in honor of 4th July, or pick text file you have!!!!!!! filename = 'NationalAnthemUSA.txt' # create list of lower case words, \s+ --> match any whitespace(s) # you can replace file(filename).read() with given string word_list = re.split('\s+', file(filename).read().lower()) print 'Words in text:', len(word_list) # create dictionary of word:frequency pairs freq_dic = {} # punctuation marks to be removed punctuation = re.compile(r'[.?!,":;]') for word in word_list: # remove punctuation marks word = punctuation.sub("", word) # form dictionary try: freq_dic[word] += 1 except: freq_dic[word] = 1 print 'Unique words:', len(freq_dic) # create list of (key, val) tuple pairs freq_list = freq_dic.items() # sort by key or word freq_list.sort() # display result for word, freq in freq_list: print word, freq
0
•
•
•
•
How would you take this and organize the words that appear in descending order not alphabetical.
0
•
•
•
•
Do you mean highest frequency first?
Simply add this to the end of the code:
Simply add this to the end of the code:
python Syntax (Toggle Plain Text)
print '-'*30 print "sorted by highest frequency first:" # create list of (val, key) tuple pairs freq_list2 = [(val, key) for key, val in freq_dic.items()] # sort by val or frequency freq_list2.sort(reverse=True) # display result for freq, word in freq_list2: print word, freq
Last edited by bumsfeld; Oct 12th, 2009 at 4:06 pm.
0
•
•
•
•
line 21 onwards:
Could be replaced by:
gets rid of the try except and just makes things a little neater.
python Syntax (Toggle Plain Text)
# form dictionary try: freq_dic[word] += 1 except: freq_dic[word] = 1
python Syntax (Toggle Plain Text)
freq_dic[word] = freq_dic.get(word,0) + 1
Similar Threads
- Code Snippet: Word Frequency in a Text String (Python) (Python)
- Top word frequency counter help (C)
- word counter, frequency, percentage (Java)
- Word Frequency Counter Help (Java)
- Word Frequency Count (Java)
| Thread Tools | Search this Thread |
Tag cloud for Python
abrupt ansi anti approximation assignment avogadro backend basic beginner binary bluetooth calculator character code customdialog decimals dictionaries dictionary drive dynamic examples excel exe file float format ftp function gnu graphics gui heads homework http ideas import input java launcher leftmouse line linux list lists loop module mouse number numbers output parsing path pointer port prime program programming progressbar projects py2exe pygame pyqt python random recursion recursive refresh schedule scrolledtext sqlite ssh statistics stdout string strings sudokusolver sum table terminal text thread threading time tkinter tlapse tricks tuple tutorial twoup ubuntu unicode update urllib urllib2 variable wikipedia windows write wxpython xlib




useful indeed