Hi all,
I have a data which looks like this,
Input:

man1 car
man1 wallet
man1 shirt
man1 shirt
man2 truck
man2 house
man2 jacket
man2 jacket
man2 jacjet
man2 computer

I want to collect the data, but while collecting I want to count them as well
So the the output should look somewhat like this.
Output:

man1 car(1),wallet(1),shirt(2)
man2 truck(1),house(1),jacket(3),computer(1)

I tried putting it as a dictionary but then I am loosing the order in the data.
Please help me I have been trying this for a long time...
/dnyansagar

Recommended Answers

All 8 Replies

You can use ordereddict if you want to remember the order of entry. Is your data in sorted like it looks? So you woild end up with ordereddict of dicts/ordereddicts/defaultdict(int) of counts of things.

And you might want to use a tuple for the key, ("man1", "car"), ("man1", "wallet"), etc. as a dictionary of dictionaries gets messy fast.

Yes my data is ordered and I don't want to loose that order.
OrderedDict worked perfectly fine
Thank you...

You can use ordereddict if you want to remember the order of entry. Is your data in sorted like it looks? So you woild end up with ordereddict of dicts/ordereddicts/defaultdict(int) of counts of things.

If the thread is solved, it is your responsibility to close it. Same time it is also possible to do any up/down voting of answers (optionally giving reputation) if you wish.

Hi
In similar case if the data is like this
Input:

man1 car
man1 wallet
man1 shirt
man1 shirt
man1 car
man1 car
man2 truck
man2 house
man2 jacket
man2 jacket
man2 jacjet
man2 computer

output:

man1    car(1),wallet(1),shirt(2),car(2)
man2    truck(1),house(1),jacket(3),computer(1)

like if car appears again, it should be counted again
for now I just make man a key and other things as values(list)
and count the items in list by using following function

def getCount(iterable):
    counts = {}
    for x in iterable :
            counts[x] = counts.get(x, 0) + 1
    return counts

If the thread is solved, it is your responsibility to close it. Same time it is also possible to do any up/down voting of answers (optionally giving reputation) if you wish.

As you got it working, here is my suggestion:

from operator import itemgetter
from itertools import groupby

data='''
man1 car
man1 wallet
man1 shirt
man1 shirt
man1 car
man1 car
man2 truck
man2 house
man2 jacket
man2 jacket
man2 jacjet
man2 computer'''.splitlines()

data = [d.split() for d in data if d]
data = [(key, [b for a,b in g]) for key, g in groupby(data, itemgetter(0))]
print(data)

counts = { k: {obj: g.count(obj) for obj in set(g) } for k, g in data}

print(counts)
print('man2 has %i jackets' % counts['man2']['jacket'])

Thank you pyTony,
But let me explain it,
I am working with protein domains,
domains can be present at the start of the protein or at the end of the protein.
If I take a combined count it would mean they are together which is incorrect.
That is the reason I want to take a separate counts.
Is there any trick to do this?

What you mean, the counts are by first item separately in count dictionaries? See the last line accessing 'man2' info of 'jacket' (2 as one was misspelled). Formatting the output I left to you.

What info you are not getting from the dictionary of dictionaries? Maybe you want list of dictionaries instead?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.