Hi folks - I think I'm almost there...

I have files with data in this format:

ID1 ID2 Dist
1 a 50
2 b 20
3 c 10
2 c 100
4 c 80
4 a 70
1 a 90
2 a 34
3 b 5
2 b 6
1 a 12
1 c 12
4 a 14

I need to find the minimum value for Dist based on ID2, i.e. what is the minimum Dist value for set a, b, c and d.

I have the following, but I clearly need help...

1) I'm currently reading each line of an input file into an array, but in reality the input file is so huge that this would not be practical - I need to somehow process each line of the file without holding the entire file in temporary storage.

2) I'm trying to split each line of this array generated from readlines into separate columns, but I don't know how to do this properly.

3) I think I'm onto the ultimate solution with the line "value=...", but it's the preceding steps that are flummoxing me.

fin = open( "input.txt", "r" )
count=0
datalist=fin.readlines()

for line in datalist:
    count=count+1
    datalist[count] = line.split()
    
x=line[0]
y=line[1]
z=line[2]
value = min(z for z in datalist if y = a)
fout = open("output.txt", "w")

fout.writelines(value)

Thanks!

Edited 6 Years Ago by Stubaan: n/a

I just wanted to add that this is for my girlfriend, who is stressed out to the max trying to achieve this task manually in Excel. So by helping me you'll be helping me help her, and she is soooooo nice you definitely want to help her if you can!

If the data isn't too large, consider storing it in a dictionary. It could work like this:

s = '''ID1 ID2 Dist
1 a 50
2 b 20
3 c 10
2 c 100
4 c 80
4 a 70
1 a 90
2 a 34
3 b 5
2 b 6
1 a 12
1 c 12
4 a 14'''

datalist = s.split('\n')

# fileObj = open(file_name)
# if reading the file, use fileObj.readline()
headers = datalist.pop(0)

dd = {}

# if reading the file, iterate on fileObj instead of datalist
for line in datalist:
    linelist = line.split()
    dd.setdefault(linelist[1], []).append(int(linelist[2]))

keys = dd.keys()
keys.sort()
for key in keys:
    print "Minimum value for key '%s' = %s" % (key, min(dd[key]))

The output:

>>> Minimum value for key 'a' = 12
Minimum value for key 'b' = 5
Minimum value for key 'c' = 10
>>>

The data is quite large, though for now I've broken it up into smaller pieces just to work with. Do you know what the maximum number of rows one might be able to work with in this way is?

THANK YOU! About to give it a try.

The data is quite large, though for now I've broken it up into smaller pieces just to work with. Do you know what the maximum number of rows one might be able to work with in this way is?

THANK YOU! About to give it a try.

There is no limit that I am aware of except system memory.

If you want to process the file one line at a time, here's a loop that will do that for you.

myline = "dummy_string"
while myline != "":
    myline = fin.readline()
    # do whatever you want, one line at a time

Just handle each line as it comes along, and you won't have quite as high of a high memory footprint.

Edited 6 Years Ago by IsharaComix: n/a

So I have made the following changes to read from an input file, but it is only processing the first line, so all I get for output is "Minimum value for key 'a' = 50". I

fileObj = open('input.txt')
datalist = fileObj.readlines()
#datalist = s.split('\n')
#headers = datalist.pop(0)
 
dd = {}
 
# if reading the file, iterate on fileObj instead of datalist
for line in datalist:
    linelist = line.split()
    dd.setdefault(linelist[1], []).append(int(linelist[2]))
 
keys = dd.keys()
keys.sort()
for key in keys:
    print "Minimum value for key '%s' = %s" % (key, min(dd[key]))

I cannot use the last post to process the file one line at a time because I need to scan all the lines to see which associated value is the minimum.

So close...

You're thinking about it too hard. If all you care about is the minimum, you only need to store the minimum. This code has some issues as far as robustness goes, but this should work for the most part.

fileObj = open('input.txt')
 
dd = {}

#ignoring the header
mystring = fileObj.readline()

# The actual loop
mystring = fileObj.readline()
while mystring != "":
    try:
        codes = mystring.split()
        x = int(codes[2])
        if codes[1] not in dd or dd[codes[1]] > x:
            dd[codes[1]] = x
    except: pass
    mystring = fileObj.readline()
 
keys = dd.keys()
keys.sort()
for key in keys:
    print "Minimum value for key '%s' = %d" % (key, dd[key])

fileObj.close()

Edited 6 Years Ago by IsharaComix: n/a

Thanks - that code makes very helpful sense to me :-)

Part of the problem I was having was that I had copied data into a text file for messing around with from Excel, which never had any end-of-line returns.

Thanks so much to all!!

To simplify the loop on fileObj:

for line in fileObj:

The for loop will terminate when a StopIteration is encountered which is typical for iterable objects.

How do I get it to write the results to a file rather than to screen?

I tried the following (and similar variations) but nothing works.

fileObj = open('input.txt')
fout = open('output.txt')
 
dd = {}
 
mystring = fileObj.readline()

while mystring != "":
    codes = mystring.split()
    #x = int(codes[2])
    x = codes[2]
    if codes[1] not in dd or dd[codes[1]] > x:
        dd[codes[1]] = x
    mystring = fileObj.readline()
 
keys = dd.keys()
keys.sort()
for key in keys:
    #line = ("Minimum value for key '%s' = %s" % (key, dd[key]))
    fout.write("Minimum value for key '%s' = %s" % (key, dd[key]))
#print "Minimum value for key '%s' = %s" % (key, dd[key])
 
fileObj.close()

Using the code I posted:

f = open(file_name, 'w')
f.write('\n'.join(["Minimum value for key '%s' = %s" % (key, min(dd[key])) for key in keys]))
f.close()

Using the code I posted:

f = open(file_name, 'w')
f.write('\n'.join(["Minimum value for key '%s' = %s" % (key, min(dd[key])) for key in keys]))
f.close()

This doesn't seem to print the actual result, only the text within the inverted commas and a final "."

This doesn't seem to print the actual result, only the text within the inverted commas and a final "."

Still unable to get this to print the result as well as the text - this is beginning to make me feel like an idiot!

The following prints the results twice:

s = '''ID1 ID2 Dist
1 a 50
2 b 20
3 c 10
2 c 100
4 c 80
4 a 70
1 a 90
2 a 34
3 b 5
2 b 6
1 a 12
1 c 12
4 a 14'''

datalist = s.split('\n')

headers = datalist.pop(0)
dd = {}
for line in datalist:
    linelist = line.split()
    dd.setdefault(linelist[1], []).append(int(linelist[2]))

keys = dd.keys()
keys.sort()
for key in keys:
    print "Minimum value for key '%s' = %s" % (key, min(dd[key]))

print '\n'.join(["Minimum value for key '%s' = %s" % (key, min(dd[key])) for key in keys])

Post the code you are trying to use if you are receiving an error or unexpected results.

This is what I'm using. It prints to screen correctly, but the print to output.txt does not - the text and variables are written to the file but not the actual value for x...

fileObj = open('input.txt')
f = open('output.txt', 'w')
 
dd = {}
 
mystring = fileObj.readline()

while mystring != "":
    codes = mystring.split()
    #x = int(codes[2])
    x = codes[2]
    if codes[1] not in dd or dd[codes[1]] > x:
        dd[codes[1]] = x
    mystring = fileObj.readline()
 
keys = dd.keys()
keys.sort()
for key in keys:
    print "Minimum value for key '%s' = %s" % (key, dd[key])
    
fileObj.close()

f.write('\n'.join(["Minimum value for key '%s' = %s" % (key, min(dd[key])) for key in keys]))

f.close()

I made some changes to your code. It's best to iterate on the file object. I skipped the first line in the file.

fileObj = open('input1.txt')
dd = {}
fileObj.readline()
for line in fileObj:
    codes = line.split()
    x = int(codes[2])
    if codes[1] not in dd or dd[codes[1]] > x:
        dd[codes[1]] = x
 
keys = dd.keys()
keys.sort()
for key in keys:
    print "Minimum value for key '%s' = %s" % (key, dd[key])
    
fileObj.close()

f = open('output1.txt', 'w')
f.write('\n'.join(["Minimum value for key '%s' = %s" % (key, dd[key]) for key in keys]))
f.close()

You mixed up some of what I posted with what IsharaComix posted.

Thanks - worked like a charm. Looks like I had an extra parentheses in the write line somewhere. I am very unfamiliar with python so thanks for you patience - I've learned a lot.

How complicated would it be to filter this filtering process for each value in code[0] as well? Basically I would now like to repeat this for each code in code[0] such that the output would give me multiple minimum values for code[1] corresponding to the various sorting flags in code[0]:

Minimum value for key1 '92' and key2 '1' = min1
Minimum value for key1 '92' and key2 '2' = min2

Key1 refers to the values in code[1] and key2 to code[0]

Thank you, again!

This article has been dead for over six months. Start a new discussion instead.