Hi,

I have the following files and can match but, am having trouble with getting the output to write correctly to a data matrix format that I need and would appreciate any help with getting the code right.

File 1 has a list of numbers and Files 2, 3 and 4 have subsets of these numbers along with a column 2 of data. I need to match all the subsets in cols 1 to File 1 and then output a data matrix (file output). see below for file details.

File 1:

1000018
10004
1000422
1000428
1000540
1000545
1000548
1000551
1000635
1000707
1000827
1000854
1000875
1000967
1001036
1001094
1001096
1001113
1001331
1001355
1001487
1002454
1002817

File 2: (files 3,4,5.... same format, different subset of numbers and data)

number        data
1000540      x
1000545      a
1000548      w
1000551      t
1001094      a
1001096      a
1002817      w

output file:

(all numbers
from file 1)       file2    file3
1000018                       w
10004                           b
1000422                       c
1000428
1000540      x                x
1000545      a
1000548
1000551
1000635
1000707
1000827
1000854......

Here's the code I've been working with:

Sps={}
for fn in os.listdir('.'):
    if fnmatch.fnmatch(fn,'*.out'):
        file_list={}
        file_list=fn
        Id=file_list[0:3]
        for line in open(fn).readlines():
            line=line.rstrip()
            x=line.split("t")
            pos=x[0]
            alle=x[1]
            SPs[pos]=alle

for line in open("All.pos",'r'):
    line=line.rstrip()
    vals=line.split("t")
    pos1=vals[0]
    Map=SPs.get(pos1)

Edited 3 Years Ago by mike_2000_17: Fixed formatting

Hi,

I have the following files and can match but, am having trouble with getting the output to write correctly to a data matrix format that I need and would appreciate any help with getting the code right-

File 1 has a list of numbers and Files 2, 3 and 4 have subsets of these numbers along with a column 2 of data. I need to match all the subsets in cols 1 to File 1 and then output a data matrix (file output). see below for file details.

File 1:

1000018
10004
1000422
1000428
1000540
1000545
1000548
1000551
1000635
1000707
1000827
1000854
1000875
1000967
1001036
1001094
1001096
1001113
1001331
1001355
1001487
1002454
1002817

File 2: (files 3,4,5.... same format, different subset of numbers and data)

number        data
1000540      x
1000545      a
1000548      w
1000551      t
1001094      a
1001096      a
1002817      w

output file:
(all numbers
from file 1) file2 file3

1000018                       w
10004                           b
1000422                       c
1000428
1000540      x                x
1000545      a
1000548
1000551
1000635
1000707
1000827
1000854......

Here's the code I've been working with:

Sps={}
for fn in os.listdir('.'):
	if fnmatch.fnmatch(fn,'*.out'):
		file_list={}
		file_list=fn
		Id=file_list[0:3]
		for line in open(fn).readlines():
			line=line.rstrip()
			x=line.split("\t")
			pos=x[0]
			alle=x[1]
			SPs[pos]=alle
            
for line in open("All.pos",'r'):
	line=line.rstrip()
	vals=line.split("\t")
	pos1=vals[0]
	Map=SPs.get(pos1)

Edited 6 Years Ago by pyTony: n/a

I would suggest using a dictionary with the individual records in file1 as the key pointing to a list of items found in the other files.

This article has been dead for over six months. Start a new discussion instead.