top words frequencies of combined files

Question

otto531 0 Newbie Poster

12 Years Ago

I've been looking around for an answer to this but have had no luck. I need to take two files and print the top most frequent words they have in common as well as their combined(sum) frequencies. This might be simple but I'm pretty new to programming. Any help?

def mostFrequent(word,frequency,n):
   my_list = zip(word,frequency) #combine the two lists
   my_list.sort(key=lambda x:x[1],reverse=True) #sort by freq
   words,freqs = zip(*my_list[:n]) #take the top n entries and split back to seperate lists
   return words, freqs #return our most frequent words in order   

from wordFrequencies import * #gives both the word and its frequency in a file
L1 = wordFrequencies('file1.txt')
words1 = L1[0]
freqs1 = L1[1]
L2 = wordFrequencies('file2.txt')
words2 = L2[0]
freqs2 = L2[1]
print mostFrequent(words,freqs,20)

I've tried

L1 = WordFrequencies('file1.txt')
words1 = set(L1[0])
freqs1 = set(L1[1])
L2 = WordFrequencies('file2.txt')
words2 = set(L2[0])
freqs2 = set(L2[1])
words3 = words1.intersection(words2)
freqs3 = freqs1.intersection(freqs2)
print mostFrequent(words3,freqs3,20)

but it didn't work. It outputed the wrong words

python

4 Contributors
3 Replies
312 Views
13 Hours Discussion Span
Latest Post 12 Years Ago Latest Post by bumsfeld

All 3 Replies

woooee 814 Nearly a Posting Maven

12 Years Ago

May already be answered, Duplicate (cross-posted) at
http://bytes.com/topic/python/answers/947596-how-can-i-find-top-words-frequencies-combined-files#post3743844
and http://forums.devshed.com/python-programming-11/return-common-words-in-two-files-941286.html

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 1 · 2013-03-08T13:40:14+00:00

Use dictionaries

D = [dict(zip(*WordsFrequencies(name))) for name in ['file1.txt', 'file2.txt']]
common_words = set(D[0]) & set(D[1])
L = [(w, D[0][w] + D[1][w]) for w in common_words]
# sort by decreasing frequencies, solve ties by increasing alphabetical order.
L.sort(key = lambda t: (-t[1], t[0]))
L = L[:20]

bumsfeld 413 Nearly a Posting Virtuoso · Answer 2 · 2013-03-08T17:36:43+00:00

Sets will be ordered by hash order, so you loose the relationship of word and frequency data.

It could be easier to use collections.Counter() on each of your text files and go from there.

top words frequencies of combined files

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers