Hello, I have a python script that deletes a line of a text file when the number in the second column is greater than or equal to 5.

So my input file is three columns:

1    3    aaa
2    3    aaa
3    6    aaaaaa
4    2    aaa
def filter(in_file, output):
	F_OUT= open(output, 'w')
	pos = []
	rd = []
	gc = []
	for line in in_file:
                number = line.split()
		pos.append(number[0])
		rd.append(number[1])
		gc.append(number[2])

		for i in number[1]:
			if int(i) >= 5:
				del number[0]
				del number[0]
				del number[0]

			F_OUT.write(str(number[0]).ljust(10) + str(number[1].ljust(10)) + str(number[2]))
			F_OUT.write('\n')
filter(in_file, results)

I want my output file to contain all the same data and setup, except for the deleted line like below.

1    3    aaa
2    3    aaa
4    2    aaa

But I get an error message

Traceback (most recent call last):
    F_OUT.write(str(number[0]).ljust(10) + str(number[1].ljust(10)) + str(number[2]))
IndexError: list index out of range

Does anyone know how I can rectify the problem?
Thanks.

Some print statements should help:

for i in number[1]:
    print "test i =", i
    if int(i) >= 5:
        print "     deleting", number[0]
        del number[0]
        print "     deleting", number[0]
        del number[0]
        print "     deleting", number[0]
        del number[0]
    print "     after, number =", number, "len(number) =", len(number)

And __definitely__ test this with numbers greater than 10.

Edited 5 Years Ago by woooee: n/a

You never want to del part of a thing in the midst of a loop over the items in the thing: Think about what must be going on when you call del.

Instead, one of two options:

  1. write each line if it passes the test, as you see it
  2. save each successful line into a list of lines, then iterate over those lines, writing each.

I prefer option 1: It is the more efficient.

Edited 5 Years Ago by griswolf: n/a

Only thing which made me think more like alternative 2 of Gris is that purpose is to modify existing file. Then natural way is to read in whole file to list of lines and close the file, then reopen the original and do the filtering of the line with immediate writing to original file. This is little risky undertaking, so better to test well with toy data. If the input is really huge, then we would be in difficulty but with modern computers it is often acceptable to temporary keep input file content in memory. If you want to avoid that you must resort to temporary file, deleting original and renaming (moving) temporary file.

Edited 5 Years Ago by pyTony: n/a

Only thing which made me think more like alternative 2 of Gris is that purpose is to modify existing file. Then natural way is to read in whole file to list of lines and close the file, then reopen the original and do the filtering of the line with immediate writing to original file. This is little risky undertaking, so better to test well with toy data. If the input is really huge, then we would be in difficulty but with modern computers it is often acceptable to temporary keep input file content in memory. If you want to avoid that you must resort to temporary file, deleting original and renaming (moving) temporary file.

I immediately keyed on the 'risky' in your post. In fact, it is much better to

  1. rename the old file to a backup-name,
  2. then be sure you can open the old filename for writing,
  3. then be sure you can open the backup name for reading
  4. then do the programming work
    • maybe read the backup file all at once, then write as needed (my option 2)
    • maybe read the backup file line by line and write each as needed (my option 1)
  5. then close the open files
  6. then double check that the new file (old name) is valid
  7. then finally unlink the backup file

filter is built in function.

Ah yes, I am aware of that. In my script I actually defined it as something else.

So I've actually ended up doing it a different way. To avoid the confusion of deleting something whilst in a loop, instead I just output the data if it agrees to the condition.

for i in number[1]:
	if int(i) < 6:
	    F_OUT.write(str(number[0]).ljust(10) + str(number[1].ljust(10)) + str(number[2]))
	    F_OUT.write('\n')

Thanks for all your comments!

This question has already been answered. Start a new discussion instead.