Hi all! I was hoping I could get some help. I have a input file of data in 5 columns

eg.

A	1	10	B	0.7
B	1	203	A	0.98
C	2	805	C	0.99
...

So what I'm trying to do, is when each item in the first column (A,B,C) is equal to the item in the 4th column (B,A,C) I want to rearrange the last column and print the columns 1,2,3,and 5 with the correct values. Like this below.

A	1	10	0.98
B	1	203	0.7
C	2	805	0.99

This is what I have been working with at the moment,

for line in in_file:
 number = line.split()
 for col in number[0]:
  if col == number[3]:
   print number[0], number[1], number[2], number[4]

It should be pretty easy, but I just can't get any of the things I've been doing to work out. So I would appreciate any suggestions as to how to go about it.

I suggest traversing the file and building a list and a dictionary like

thelist = [
    ('A', 1, 10),
    ('B', 1, 203),
    ('C', 2, 805),
]

thedict = {
    'B': 0.7,
    'A': 0.98,
    'C': 0,99,
}

Then traverse the list and read the third value in the dict, and write to the output file. I assume here that each item appears exactly once in the first and the 4th columns.

Edited 5 Years Ago by Gribouillis: n/a

Adding print statements will tell you what is going on.

for line in in_file:
 print "line =", line
 number = line.split()
 print "number =". number
 for col in number[0]:
  print "comparing", col, number[3]
  if col == number[3]:
   print number[0], number[1], number[2], number[4]

I suggest traversing the file and building a list and a dictionary like

thelist = [
    ('A', 1, 10),
    ('B', 1, 203),
    ('C', 2, 805),
]

thedict = {
    'B': 0.7,
    'A': 0.98,
    'C': 0,99,
}

Then traverse the list and read the third value in the dict, and write to the output file. I assume here that each item appears exactly once in the first and the 4th columns.

Thanks for the suggestion. However, I don't understand what you mean by 'trasversing' the file and I'm not sure how to go about that. And is it okay to use a dictionary if you have a really big file?

The items should appear only once in the 1st and 4th columns.

Thanks for the suggestion. However, I don't understand what you mean by 'trasversing' the file and I'm not sure how to go about that. And is it okay to use a dictionary if you have a really big file?

The items should appear only once in the 1st and 4th columns.

'Traverse' is a generic term for composite data structures, it usually means write a loop which visits every item of the structure, like 'for line in the_file' or 'for key, value in the_dict.iteritems()'. About the big file, it all depends on your RAM. If you have 2GB of RAM, a 100MB file is not necessarily big. On the other hand, if the file's size is 20GB, it won't work. In this case, you could first write a second file containing the data (or the 4th and 5th column) sorted on the 4th column. This sorting phase would involve splitting the file and a merge sort.

The solution is perhaps to write a first version where you save a sorted file simply by loading the initial file in memory and using the built-in sorted() function, and you produce your output file using the initial file and the sorted file. This will work for reasonably sized files. You could then handle large files simply by altering the sorting phase.

Edited 5 Years Ago by Gribouillis: n/a

This article has been dead for over six months. Start a new discussion instead.