954,510 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

How to read 2 files and find matching rows from them

Hi,

I am really new to Python and I am having real trouble with this new language.
This is what I want to do:
I have two sets of .txt files that contain numbers. They look like this:

FILE1
1028.085 283.795 2056.13 295.254 121912.4 11.346 0.004
147.932 780.677 966.771 289.326 34355.5 12.721 0.011
710.507 541.051 973.092 684.851 32184.8 12.792 0.016

FILE2
147.695 780.377 966.771 289.456 29963.6 12.870 0.013
716.658 546.237 938.72 653.857 22436.3 13.184 0.023
1028.385 283.495 2056.13 295.254 121912.4 11.346 0.004
1028.485 283.405 2056.13 295.254 121912.4 11.446 0.004


The first two columns are position x and y of some data. The rest is not that important for now.

What I want to do is
a) read in those files
b) find a matching (x,y) coordinates from the files within some degree (let's say within by .5 because the coordinates of file2 may not be exactly the same from file1)
c) write them side-by-side in a new (third) file
d) if there was no match then still write it out in the output with some comments saying "there was no match" or something
e) if there is more than one match, put a comment saying "this pair had more than one match"

Thus, my final output should have the following columns:
x1, y1, col3_1, col4_1, col5_1, col6_1, col7_1, x2, y2, col3_2, col4_2, col5_2, col6_2, col7_2, comment
These should be printed on the very first row of the output as well.

Any help would be greatly appreciated.

Thank you!

7sisters
Newbie Poster
2 posts since Jun 2011
Reputation Points: 10
Solved Threads: 0
 

Is the order of lines significant? You could read lines in from both files in one list, change values to float numbers. Sort by x.y to bring near points near each other so you do not need to compare distance of every point from each other. But what if distance of p1 to p2 is 0.4, distance of p3 from p1 is 0.4 and distance of p2 to p3 is 0.7? So all neighbours of point are not neighbours of each other.

pyTony
pyMod
Moderator
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
 

I think you could follow an algorithm similar to this one:

>>> print A
['h', 'e', 'l', 'l', 'o', ' ', 'd', 'a', 'n', 'i', 'w', 'e', 'b']
>>> print B
['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', 'n', ' ', 'e', 'x', 'a', 'm', 'p', 'l', 'e']
>>> for i, x in enumerate(A):
...  for j, y in enumerate(B):
...   if x == y:
...    print i, j
... 
0 1
1 11
1 17
2 16
3 16
5 4
5 7
5 10
7 8
7 13
8 9
9 2
9 5
11 11
11 17
Gribouillis
Posting Maven
Moderator
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
 

What do I do if I want to print out the entire rows, not just the matching values?
If I do

>>> for i, x in enumerate(A):
... for j, y in enumerate(B):
... if x == y:
... print i, j
...


Or

print x, y

on the last line instead, it only prints out the matching letters, in this case.

7sisters
Newbie Poster
2 posts since Jun 2011
Reputation Points: 10
Solved Threads: 0
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: