Hi all,:'( his for over a week now i cant get my head round it so i hope some one can help me...

i have a dictionary which reads my file line by line and counts the occurance of tuples ...ok great.

then for every key in my dictionary (which =my tuples from line)
i must take a random sample and print the randomly selected lines????
confused??
ok hope not but my problem is how to i make a link for lines and my dictionary
for example

{(4, 22, 5): 300}
if i wanted to print the randomly 100 of the total 300 counted lines how would i do this?

i tied to use array and find my tuples back in my lines but i keep getting an error

PLEASE SOME ONE HELP PLEASE.... thanks for reading this

dict={}
for lines in file:
    match=(line[0], line[2], line[3])
    if not dict.has_key(match):
        dict[match] =0
    
    dict[match] +=1

this is the code i use to create my dictionary from the values in my line and count the occurance of them in each match

well from the top of my head it should be

for line in file

instead of the plural form of line.

Is this basically how the gist of your script?

handle = open("file.file")
file = handle.readlines()
handle.close()
del handle

dict = {}
for line in file:
	#Here we just get the first, 3rd, and 4th character?
	match = (line[0], line[2], line[3])
	if match not in dict.keys():
		dict[match] = 0
	dict[match] += 1

sorry i must have a typing error? yes that is the gist of my code once my dictionary is complete for every match (which is my tuple of 1,3and 4th character in my line) i need to locate the full line for every match key and randomly select a specific sample size.

so if i had say 20 keys(match) i want to randomly select 100 lines from each key and print it....
any ideas please i would be grateful as i mentioned i tried using array (numpy array) and refer back but its not working???

any ideas are most appreciated

:scared: nervous i wont be able to get round this problem

Can you post what you have already done? I might be able to help you once I get a clearer picture of what's going on here.

here i a snip i think the longer way for me is to write out my original dict to a new file and sample this way but this is long and messy hope i can get some help... im not sure array is good either

from numpy import *
read=open('myfile','r')
reader=read.readline()
for line in reader:
 #here i carry out some other functions


    file.append(#my changed lines)
#my counting dict
dict={}
for lines in file:
    match=(line[0], line[2], line[3])
    if not dict.has_key(match):
        dict[match] =0
    
    dict[match] +=1

#here im trying to create an array? my files are large !

for lines in file:
    arrayed=array(file)
    find_these_lines=(lines[0], lines[2], lines[3])

for match in dict:
    group=[]
    if find_these_lines == dict[match]:
        group.append(lines)
        #here i would like to apply a random sample to get 100 samples at random of the lines

please has anyone got any suggestions im still stuck

Suggestions about what? If you want 100 random samples, then you can select the dictionary's keys randomly.

import random

##---  create dictionary with 1000 keys
test_data_d = {}
for j in range(1001, 2001):
   test_data_d[j] = "data for key %d" % (j)

##--- keys_list is in hash order (appears random) and not
##    numerical order which actually works better for randomness
keys_list = test_data_d.keys()

##---  10 random keys is enough
stop = len(keys_list) - 1
for j in range(0, 10):
   random_num = random.randint(1, stop)
   this_key = keys_list[random_num]
   print j+1, test_data_d[this_key]

Note that any random number can appear more than once, so might want to store the numbers in a list and if that random number has already been used, choose another one.

This question has already been answered. Start a new discussion instead.