hi all,

i still cant get myself out in my previous thread i wrote that i wanted to carry out a random sample function on an entire file..

now what i wish to do is read a file use a dictionary to search lines and then get this lines and write to a new out put file is that possible???

thanks for all the help so far

7 Years
Discussion Span
Last Post by willygstyle

Now what i wish to do is read a file, use a dictionary to search lines, and then get these lines and write to a new out put file. Is that possible?

That is how I interpret your question, so let me know if I read it wrong.

To read and/or write to a file you use the builtin open method, here's the documentation.

I'm not sure what you mean by use a dictionary to search lines.


ok so basically i did this

i opened my file ect...

for values in lines:
    truple=(values[0], values[2], values[3])
    if not match.has_key(truple):
        match[truple] =0

    match[truple] +=1 

so now i want to use these tuples so i copy the tuples to a blank dict

then i would like to go through the file and for my first key all the lines in the files that have tuple print them then repeat for the next key etc ect thanks

Edited by mike_2000_17: Fixed formatting


Store the record number in the dictionary, (this code was not tested)

for rec_num, values in enumerate(lines):
    truple=(values[0], values[2], values[3])
    if not match.has_key(truple):
       match[truple] =[]


for key in match.keys():
    if len(match[key]) > 1:
        print "more than one record found for", match[key]
        for rec_num in match[key]:
            print "     ", lines[rec_num]

thank u so much im gonna try this out...
i did something completely different ... i tried to make a counter and another dictionary and good knows what as i am trying to avoid making files or printing lines ok as i said im new to prgramming no experience :( all self taught

thank u again so much i will try this out and write again


hey here is what i tiresd but all i get is a bunch of 0 were i would like my file to read line numbers???


for rec_num, v in enumerate(a):

for values in a:
    #print values
    triple=(values[0], values[2], values[3])
    if not match.has_key(triple):
        match[triple] =0


were my rec_num is meant to be counting my lines however all i seem to get is some 0's

my output is as follows:

{(4, 22, 51): 0, (4, 22, 33): 0, (4, 22, 4): 0, (4, 22, 107): 0, (4, 22, 32): 0, (4, 22, 25): 0, (4, 22, 7): 0, (4, 22, 53): 0, (4, 22, 106): 0, (4, 22, 35): 0, (4, 22, 24): 0, (4, 22, 81): 0, (4, 22, 52): 0, (4, 22, 34): 0, (4, 22, 27): 0, (4, 22, 80): 0, (4, 22, 9): 0, (4, 22, 37): 0, (4, 22, 26): 0, (4, 22, 8): 0, (4, 22, 54): 0, (4, 22, 36): 0, (4, 22, 29): 0, (4, 22, 82): 0, (4, 22, 11): 0, (4, 22, 39): 0, (4, 22, 28): 0, (4, 22, 10): 0, (4, 22, 67): 0, (4, 22, 38): 0, (4, 22, 13): 0, (4, 22, 41): 0, (4, 22, 30): 0, (4, 22, 12): 0, (4, 22, 69): 0, (4, 22, 40): 0, (4, 22, 15): 0, (4, 22, 43): 0, (4, 22, 96): 0, (4, 22, 14): 0, (4, 22, 42): 0, (4, 22, 99): 0, (4, 22, 45): 0, (4, 22, 62): 0, (4, 22, 44): 0, (4, 22, 19): 0, (4, 22, 72): 0, (4, 22, 1): 0, (4, 22, 47): 0, (4, 22, 100): 0, (4, 22, 18): 0, (4, 22, 46): 0, (4, 22, 103): 0, (4, 22, 92): 0, (4, 22, 21): 0, (4, 22, 74): 0, (4, 22, 3): 0, (4, 22, 49): 0, (4, 22, 102): 0, (4, 22, 95): 0, (4, 22, 2): 0, (4, 22, 105): 0, (4, 22, 23): 0, (4, 22, 5): 0}

Edited by mike_2000_17: Fixed formatting


You can try this:

from collections import defaultdict

match = defaultdict(int)
for values in a:
    match[tuple(values[:3])] += 1


thanks griboulillus but i want to achieve is find out which lines have these tuples in them so i can carry out a random function on them???
any suggestions please


Try some print statements,


for rec_num, v in enumerate(a):
    print "\n Next rec: rec_num is now", rec_num
    for values in a:
        print "     values =", values[0], values[2], values[3])
        triple=(values[0], values[2], values[3])
        if not match.has_key(triple):
            match[triple] =0
        print "     triple is now", values
    print "     second dictionary add", triple, rec_num

Since the dictionary contains zero for every key, it appears that this line
is not working correctly. Are you getting error messages? Also, read up on dictionaries and decide if you are going to use "if not triple in match" (which you define as a dictionary of one integer), or "setdefault" (which you define as a dictionary of lists). When you get a clean dictionary, you can use the rec_num associated with each key to access that record, although you might want to allow for duplicate keys, i.e. use a dictionary of lists instead of a dictionary of one integer. You appear to have just copied code and do not understand what is happening. It is almost impossible to get something to work if you don't understand it, so add print statements and anything else that helps clear up the picture.


thanks woooeee im going to try all this out thanks for the advice i'm goona try this out i want to have one key with numerous values which are my record numbers uuff im just getting myself in a muddle :( thank you i will post back soon


I think it would be a good idea to make a post with
1) a part of your input file, say 20 lines
2) the precise output that you're expecting from your program or some of the functions that you're trying to write.


thanks woooeee im going to try all this out thanks for the advice i'm goona try this out i want to have one key with numerous values which are my record numbers uuff im just getting myself in a muddle :( thank you i will post back soon

What are you going to try? Take Gribouillis' advice and use just a part of the file for testing. First, read the records one at a time and print each one so you know that part is correct, as you are only using 20 records or so.

Then, create the tuple and add it to a dictionary and print the dictionary after each add so you know that part is correct. Once that is done, post your code, and state what you want to do next for some more advice. You also have to define "random" in more concrete terms. What if you want the second record that contains the tuple (a, b, c) and there was only one record with that tuple? Or do you want to choose any random record? Why must it have a specific tuple and how do you decide which tuple to use?


no ok what im doing is
1) reading my file finding my tuple in every record(line)
2) dict which tellls me how many lines i have that have tuples so eg.... tuple (4,22,51) = 1515 so this means i have 1515 lines in my file that have the above tuple init
3) then what im trying to do is locate my records with the above tuple and select x number of lines at random to print and i want this to aoccur for ever tuple(key in my dict)

i havent got my code with me today but what i have done creadted a random function that selects 15 k, from my population (dict[keys])
i get a random selction of numbers and im trying now to make a link between the values selected and the lines were so for my firt key in my dict
if i get random ssample values of 15,69, 1515...etc etc then it can print the records so if my count in dict 15 was line number 2000 then line 200 is printed
i used like an enumerate value but i cant get that to work??
what i was trying to do with the set.default is try and add enumarate vales which would equal my line number and then perhaps carry out a random.sample function on these values and then use this to print my record(lines)
but that dont work???
see what i though i could do i have my line tuple as keys and then using the enumerate function have multiple values see im really new to python and programming sorry for all the confusion i maybe causing to all of you


tuple (4,22,51) = 1515 so this means i have 1515 lines in my file that have the above tuple init

How do you know this is accurate? Have you counted them, or tested on a smaller sample and counted those to make sure it is correct? It is impossible to test with huge amounts of data. You have to create a small sub-set and work with that. The following code stores record numbers for each key tuple using a dictionary of lists, and then prints all of the associated records. You can, of course, pick and choose which ones you want to print using random numbers, etc.

tuple_test = (("a", "b", "c"), ("d", "e", "f"), ("g", "h", "i"))

##---  create some test data
records_test = []

##--- a dictionary with the tuple as key, pointing to a
#     list of record numbers
record_num_dic = {}
for rec_num, tuple_rec in enumerate(records_test):
   print "processing", tuple_rec, "rec number =", rec_num
   if tuple_rec not in record_num_dic:
      ## add a new key pointing to an empty list
      record_num_dic[tuple_rec] = []

for key in record_num_dic:
   print key, record_num_dic[key]
   rec_num_list = record_num_dic[key]
   for rec_num in rec_num_list:
      print "      ", rec_num, records_test[rec_num]

thanks for all the advice, i had no net over the weekend :(
yes i am working with a small sample set which will print 3 for every tuple if u like i post a sample onto the site?
ps how and what is the min to donate on this site????


hey guys so heres what i got so far thanks to ur help..

record_num_dic = {}
for rec_num, tuple_rec in enumerate(a):
    tuple=(tuple_rec[0], tuple_rec[2], tuple_rec[3]) # this reads  #parts of the lines that are im-otant to me
    if not record_num_dic.has_key(tuple):
        record_num_dic[tuple] = []

for tuple in record_num_dic:
    print tuple, record_num_dic[tuple]

# now i am trying to select some lines at random
for tuple in record_num_dic:
#because this is my temp data i hav approx 3 min and 6 max #rec_num per tuple thats why im using 2
    for tuple in sample:
        print tuple, sample[tuple]
        for i in sample[tuple]:
            if i== rec_num: # this is where i become stuck
                print tuple_rec

im not sure now how i can relocate the lines i only get one line printed i know its something to do with the for i in sample ... loop?? what am i doing wrong?


also i was using xrange() in my.sample function but i kept getting error? interger funtion got list so i'm using range will this affect mu output with larger data sizes?


ok i have made progress

        for rec_num in rec_list:
            print rec_num, a[rec_num]

gives me a print out like this......(snippet)

(4, 22, 23) [160, 46]
160 [4, 28, 22, 23, 0.65145514000000004, 0.93927063, 1]
46 [4, 27, 22, 23, 0.75526892000000001, 1.0533032499999999, 1]
(4, 22, 5) [474, 8]
474 [4, 2739, 22, 5, 0.84721840000000004, 1.16921662, 2]
8 [4, 27, 22, 5, 0.70833743000000005, 1.00029039, 1]

so the 8 is my record number and the rest is my line thank you wooee so much.. does this look ok to you? and also if anyone knows if on large sets of data using range is ok i dont wnt repeats in my random sample function


If you posted the function it would probably make more sence, also random I believe is in the eye of the beholder. If you want no reapeats ever, you should probably keep your own log of recorded outputs, if you want it to be random as in you may get the same results twice, random should be doing just fine.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.