I want to iterate through the rows in two csv files and test values. For every row in file 1 that has the same value in Cell A as a row in file 2, I want to check to see if the file 2 value in Cell C is larger. If it is larger, then I want to print that whole row from file 2 to a new output file.

Example files
File 1
ID,Begin,End
2563,15,16
2580,27,30
2580,67,90

File 2
ID,Begin,End
2578,54,70
2580,102,104
2580,48,100

Output File
ID,Begin,End
2580,48,90


In the example, only one row in File 2 met all conditions. I’ve written a Python script to do this with two for-loops, one embedded within the other. However, while the second for-loop properly iterates through all rows in file2, the first for-loop does not iterate, it just looks at the first row and never moves on.

import string, sys, os
import csv

file_1 = csv.reader(open('file_1.csv', 'rb'))
file_2 = csv.reader(open('file_2.csv', 'rb'))

for ID1, begin1, end1 in file_1:
    for ID2, begin2, end2 in file_2:
        print 'ID1 = ', ID1
        print '  end2 = ', end2
        if ID1 == ID2:
            if end2 > end1:  print 'out'
        else: print 'nope'

This script produces the following output where ID1 is always the string "ID" (the first row of the file contains the names) and never iterates through the other rows. I assume that end1 is always the string "end", so it too never properly meets the end2 > end1 condition.
ID1 = ID
end2 = End
ID1 = ID
end2 = 70
nope
ID1 = ID
end2 = 104
nope
ID1 = ID
end2 = 100
nope

Any help would be appreciated. Thanks

Add some print statements to see what is going on and it should become obvious.

for ID1, begin1, end1 in file_1:
    print "file_1 =", ID1, begin1, end1
    for ID2, begin2, end2 in file_2:
        print "     file_2 =", ID2, begin2, end2

The problem is that after the first iteration through file_2 reader your iterator remains at the end, so for example, the next code works fine:

file_1 = csv.reader(open('file_1.csv', 'rb'))
    for (ID1, begin1, end1) in file_1:
        file_2 = csv.reader(open('file_2.csv', 'rb'))
        for (ID2, begin2, end2) in file_2:

Tudor

Comments
helpful

Thanks for the help. My final code does what I want and it looks like this:

import string, sys, os
import csv

file_1 = csv.reader(open('file_1.csv', 'rb'))
for (ID1, begin1, end1) in file_1:
    file_2 = csv.reader(open('file_2.csv', 'rb'))
    for (ID2, begin2, end2) in file_2:
        print '\nbegin1 =', begin1
        print 'begin2 =', begin2
        if (begin1 != 'Begin') and (begin2 != 'Begin'):
            if (ID1 != '_ID') and (ID2 != '_ID'):
                if int(ID1) == int(ID2):
                    print 'ID1 %d == ID2 %d' % (int(ID1),int(ID2))
                    if int(begin1) < int(begin2):
                        print '   -----begin1 %d < begin2 %d' % (int(begin1),int(begin2))
                    else: print ' begin1 is larger than begin2, not smaller'
                else: print ' ID1 (%d) is different than ID2 (%d): no compare' % (int(ID1),int(ID2))
            else: print ' at least one item is the string _ID: no compare'
        else: print ' at least one item is the string begin: no compare'

Output looks like this:

begin1 = Begin
begin2 = Begin
at least one item is the string begin: no compare

begin1 = Begin
begin2 = 54
at least one item is the string begin: no compare

begin1 = Begin
begin2 = 102
at least one item is the string begin: no compare

begin1 = Begin
begin2 = 48
at least one item is the string begin: no compare

begin1 = 15
begin2 = Begin
at least one item is the string begin: no compare

begin1 = 15
begin2 = 54
ID1 (2563) is different than ID2 (2578): no compare

begin1 = 15
begin2 = 102
ID1 (2563) is different than ID2 (2580): no compare

begin1 = 15
begin2 = 48
ID1 (2563) is different than ID2 (2580): no compare

begin1 = 27
begin2 = Begin
at least one item is the string begin: no compare

begin1 = 27
begin2 = 54
ID1 (2580) is different than ID2 (2578): no compare

begin1 = 27
begin2 = 102
ID1 2580 == ID2 2580
-----begin1 27 < begin2 102

begin1 = 27
begin2 = 48
ID1 2580 == ID2 2580
-----begin1 27 < begin2 48

begin1 = 67
begin2 = Begin
at least one item is the string begin: no compare

begin1 = 67
begin2 = 54
ID1 (2580) is different than ID2 (2578): no compare

begin1 = 67
begin2 = 102
ID1 2580 == ID2 2580
-----begin1 67 < begin2 102

begin1 = 67
begin2 = 48
ID1 2580 == ID2 2580
begin1 is larger than begin2, not smaller

You should be able to do something like the following so you don't have to re-read the file for every record in file_1.

file_1 = csv.reader(open('file_1.csv', 'rb'))
file_2_list = list(csv.reader(open('file_2.csv', 'rb')))
for (ID1, begin1, end1) in file_1:
    for (ID2, begin2, end2) in file_2_list:

Yep, both versions work. I think my original problem was my lack of parens around my column names. This seems to be the trick. Thanks!

This article has been dead for over six months. Start a new discussion instead.