Hi all,

I want to combine multiple files those are kept in a directory. Before getting the big file, I want to edit first and then combine them. For example,

I have two files such as file1 and file2 in a directory, say TEST

file1 contains

1       C       8.95377612903e-07
2       C       2.54310967742e-06
3       C       1.07986354839e-05
4       C       5.07842354839e-05
5       G       0.000244339548387
6       T       0.00117745145161
7       G       0.00199657145161
8       T       0.0059136516129
9       A       0.0093243416129
10      T       0.0122818903226
11      A       0.0148356919355
12      C       0.0170317919355
13      A       0.0273163903226
14      C       0.0360846919355
15      T       0.0767962919355
16      G       0.111205
17      A       0.269571983871
18      A       0.402242983871
19      A       0.512890032258
20      A       0.604756
21      A       0.680681
22      A       0.743145983871
23      A       0.794299

and file2 contains

1       G       4.14724e-08
2       T       7.38683612903e-08
3       G       7.33707806452e-08
4       T       1.27077546774e-07
5       A       1.37361132258e-07
6       G       1.37420164516e-07
7       A       1.59060645161e-07
8       A       1.59608032258e-07
9       A       1.55923274194e-07
10      C       9.81361774194e-08
11      C       9.78695322581e-08
12      G       1.00416609677e-07
13      G       1.12081406452e-07
14      G       1.50283725806e-07
15      G       2.55789580645e-07
16      G       5.5415083871e-07
17      G       1.36960016129e-06
18      A       3.62490290323e-06
19      T       4.50988016129e-06
20      T       4.84488935484e-06
21      G       4.94761693548e-06
22      A       5.31025516129e-06
23      C       5.42889516129e-06

Each file contains three column (an integer, a character, a decimal number). Now, I want to combine these two files in such a manner that the first column will be continuously increasing but the remaining two columns will not change.

I really appreciate your help. I look forward to hear from you soon.

Best wishes
Sudipta

Recommended Answers

All 8 Replies

Write code to read each file in sequence and print each line first. We want to see python code.

import os
import re
import string

path = '/home/sudipta/window'

for itemName in os.listdir(path):
    #Loops over each itemName in the path. Joins the path and the itemName
    #and assigns the value to itemName.
    itemName = os.path.join(path, itemName)   
    if os.path.isfile(itemName):
        lines= file(itemName, 'r').readlines()
        for i in range(0,len(lines)):
            data=lines[i].split(',')
           This is my initial code. After that I cant proceed. 

The lines don't look separated by commas, but tabs or spaces. Line 14 in your code won't probably work as expected. You can try something like

import os
import re
import string

path = '/home/sudipta/window'

count = 0
for itemName in os.listdir(path):
    #Loops over each itemName in the path. Joins the path and the itemName
    #and assigns the value to itemName.
    itemName = os.path.join(path, itemName)   
    if os.path.isfile(itemName):
        lines= file(itemName, 'r').readlines()
        for i in range(0,len(lines)):
            lineno, nucleotid, value = lines[i].rstrip().split()
            count += 1
            print("\t".join((str(count), nucleotid, value)))

Thank you very much for the reply. It edits and combines the files. But, the order of combining of the files are not correct. In my directory two files named as winjob_1_s and winjob_2_s. I want combine these two in this order but the program does in reverse order. How to solve this?

The solution is to sort the filenames. The most obvious way to do this is

filenames = os.listdir(path)
filenames = sorted(filenames)

However, this code sorts filenames alphabetically, which is not necessarily what you want, for example

>>> L = ["winjob_1_s", "winjob_2_s", "winjob_12_s"]
>>> sorted(L)
['winjob_12_s', 'winjob_1_s', 'winjob_2_s']

winjob_12_s comes first because 2 is before _ in lexicographic order. What you can do is define a function which returns a numeric value for every filename and sort according to this value. For example

def score(filename):
    try:
        L = filename.split("_") # winjob_2_s --> ['winjob', '2', 's']
        value = int(L[1])
    except Exception:
        # L had length 0 or L[1] is not an integer
        value = -1
    return value

filenames = sorted(filenames, key = score)

This sorts the filenames according to our score function.

>>> L = ["winjob_1_s", "winjob_2_s", "winjob_12_s"]
>>> sorted(L, key = score)
['winjob_1_s', 'winjob_2_s', 'winjob_12_s']

Thank you very much for the reply. It works but I have another problem. The directory contains different kinds of file and I want to combine only one kind of file of that directory.

For example:
/home/sudipta/window directory contains winjob_1_s winjob_2_s winjob_3_s and another kind of files named as winjob_1_s_z winjob_2_s_z winjob_3_s_z. I want to combine these two kind of files separately. How to do that?

Write code to produce two lists of file names, then handle each list separately.

How to produce two lists of file names? Can you help me in this regard?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.