Hi All

Hope you all are ok.

I have a query on calculating frequency distribution. I have seven different files like the one I have attached which contains items as below in example. I want to know total how many times each pair occurs combining all these files.

ex
208015_at 207042_at
208015_at 213168_at
208015_at 204790_at
208015_at 204653_at
208015_at 205312_at
208015_at 1565703_at

output would be like this
208015_at 207042_at 5 times
208015_at 213168_at 3 times
208015_at 204790_at 2 times
208015_at 204653_at 5 times


Thanks

And your problem&code?

Hi tonijv

first i tried to use the code below, i can do simply for one file by storing pairs in list /set and calling a function below. also i have a perl code to this...which works but i want to implement in python. In perl, first opened all the files using regular expressions from file name and on then counting a pair and saving them in a hash at the end printing counts for all the pairs next to it and how many unique values are shared among these files.

counts = dict((v, 0) for v in set(s))
for element in s:
counts[element] += 1
print counts

Looks better if you push (CODE) first.

counts = dict((v, 0) for v in set(s))
for element in s:
    counts[element] += 1
print counts

This look quite neat piece of code, I think I have coded similar thing before, maybe I could (re)post it.

You basically can wrap this code in another for loop. Of course if you check the 'generating lower´case words' code snippet of mine you can see maybe cleaner way of counting the words (defaútdict(int) instead of your dict generator.
Also you can do for example

counts[element] = 1 if element not in counts else counts[element]+1

Hi ToniJV, yes i got it from google only. I m just wondering how I could scan multiple files at a time and count the number of occurances. Do you have any idea on this?

Show effort! Here is code for iterating files in current directory written 'newbie' style:

import os

myext = '.py'
count = 0
for file in os.listdir(os.curdir):
    basename, ext = os.path.splitext(file)
    if ext == myext:
        count+=1
        print count,':',file