There is a list of dataset and I need to reshuffle them randomly then partition the dataset into n groups. I am using dictionary to store the list of dataset, but I am not sure how to partition items of a dictionary into n groups? What I can think of now is that to divide total of keys by n and get 1/n of keys from the dictionary to make n new dictionaries and then start the rest from there. Is this a practical method? Thanks a lot!

Recommended Answers

All 9 Replies

it is possible, though simpler is to make a number range of n repeated key_num//n +1 times, shufle and then divide list of keys by taking zip(my_dict.keys(), index_list)

I noticed that better to add random sample of range to fill the not full length in the end as otherwise the lenghts can be not so equal (if many same indexes end up in the end of list in shuffle):

import string
import random
from pprint import pprint
my_dict = dict((c,n) for c,n in zip(string.letters, xrange(53)))
parts = 4

print 'Length of dict', len(my_dict), 'parts', parts

index_list = (len(my_dict)//parts*range(parts) +
              random.sample(range(parts), len(my_dict) % parts))

print 'length of index_list:', len(index_list)

random.shuffle(index_list)
mapping = zip(my_dict.keys(), index_list)
dict_list = [ {} for i in range(parts) ]
for key, index in mapping:
    dict_list[index][key] = my_dict[key]
for d in dict_list:
    print 'Length', len(d)
    pprint (d)

print 'Sum of lengths', sum(len(s) for s in dict_list)

The code works really well! I was thinking the uneven split condition myself. Appreciate!
However, I am not quite understand the part:

index_list = (len(my_dict)//parts*range(parts) + random.sample(range(parts), len(my_dict) % parts))

which is critical for the random spilt of keys into parts, could you shed some light on that? Thanks!

I may have some idea about the index_list, it divides total keys into parts(4) groups and put the remainder if any randomly to subgroups, is that right?

Why don't you print out each part of the expression? Maybe cleaner actually to calculate both division and modulo to variables before the expression by divmod function.

I see, using index to assign keys is a good idea, elegant code. Thanks a lot!

If you have solved the case it is good to mark the thread solved. Always possible to upvote any good posts..

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.