I've been looking around for an answer to this but have had no luck. I need to take two files and print the top most frequent words they have in common as well as their combined(sum) frequencies. This might be simple but I'm pretty new to programming. Any help?

def mostFrequent(word,frequency,n):
   my_list = zip(word,frequency) #combine the two lists
   my_list.sort(key=lambda x:x[1],reverse=True) #sort by freq
   words,freqs = zip(*my_list[:n]) #take the top n entries and split back to seperate lists
   return words, freqs #return our most frequent words in order   

from wordFrequencies import * #gives both the word and its frequency in a file
L1 = wordFrequencies('file1.txt')
words1 = L1[0]
freqs1 = L1[1]
L2 = wordFrequencies('file2.txt')
words2 = L2[0]
freqs2 = L2[1]
print mostFrequent(words,freqs,20)

I've tried

L1 = WordFrequencies('file1.txt')
words1 = set(L1[0])
freqs1 = set(L1[1])
L2 = WordFrequencies('file2.txt')
words2 = set(L2[0])
freqs2 = set(L2[1])
words3 = words1.intersection(words2)
freqs3 = freqs1.intersection(freqs2)
print mostFrequent(words3,freqs3,20)

but it didn't work. It outputed the wrong words

3 Years
Discussion Span
Last Post by bumsfeld
Featured Replies
  • 1

    May already be answered, Duplicate (cross-posted) at http://bytes.com/topic/python/answers/947596-how-can-i-find-top-words-frequencies-combined-files#post3743844 and http://forums.devshed.com/python-programming-11/return-common-words-in-two-files-941286.html Read More


Use dictionaries

D = [dict(zip(*WordsFrequencies(name))) for name in ['file1.txt', 'file2.txt']]
common_words = set(D[0]) & set(D[1])
L = [(w, D[0][w] + D[1][w]) for w in common_words]
# sort by decreasing frequencies, solve ties by increasing alphabetical order.
L.sort(key = lambda t: (-t[1], t[0]))
L = L[:20]

Edited by Gribouillis


Sets will be ordered by hash order, so you loose the relationship of word and frequency data.

It could be easier to use collections.Counter() on each of your text files and go from there.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.