I have too collect a daily rainfall data from a certain model output. The model ouput gives out each stations daily data separately which I have to collect together and further analyse.

[I have tried the following, it works but my data - thousands of pcp_#.txt which makes the process very cumbersam.

ifh1=open("pcp_1.txt")
ifh2=open("pcp_2.txt")
ifh3=open("pcp_3.txt")
ifh4=open("pcp_4.txt")
ifh5=open("pcp_5.txt")
ifh6=open("pcp_6.txt")
ifh7=open("pcp_7.txt")
ifh8=open("pcp_8.txt")
ifh9=open("pcp_9.txt")
ifh10=open("pcp_10.txt")

ofh = open("pcp_total5.txt", "w")

line1=ifh1.readline()
line2=ifh2.readline()
line3=ifh3.readline()
line4=ifh4.readline()
line5=ifh5.readline()
line6=ifh6.readline()
line7=ifh7.readline()
line8=ifh8.readline()
line9=ifh9.readline()
line10=ifh10.readline()

# run through line by line and extract the data from the different txt files
while line1 and line2 and line3 and line4 and line5 and line6 and line7 and line8 and line9 and line10:
 
    fields1=line1.split()
    fields2=line2.split()
    fields3=line3.split()
    fields4=line4.split()
    fields5=line5.split()
    fields6=line6.split()
    fields7=line7.split()
    fields8=line8.split()
    fields9=line9.split()
    fields10=line10.split()

    ID = fields1[0]
    ID1 = fields1[1]
    ID2 = fields1[2]

    r1=fields1[3]
    r2=fields2[3]
    r3=fields3[3]
    r4=fields4[3]
    r5=fields5[3]
    r6=fields6[3]
    r7=fields7[3]
    r8=fields8[3]
    r9=fields9[3]
    r10=fields10[3]
 
   #writing the heading on the file    
    print >> ofh, ID,ID1, ID2,
            
    #writing the result on the file    
    print >>ofh, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10

    # running over the files

    line1=ifh1.readline()
    line2=ifh2.readline()
    line3=ifh3.readline()
    line4=ifh4.readline()
    line5=ifh5.readline()
    line6=ifh6.readline()
    line7=ifh7.readline()
    line8=ifh8.readline()
    line9=ifh9.readline()
    line10=ifh10.readline()
    
# closing the files
ofh.close()
ifh1.close
ifh2.close
ifh3.close
ifh4.close
ifh5.close
ifh6.close
ifh7.close
ifh8.close
ifh9.close
ifh10.close

Someone may have a good suggestion how to make to read the file's automatically and put them toghert. Please I need your help!!

chebude

This code will let you open ALL files that start with pcp and then have a for loop that lets you do whatever you want with the files.

import os

fileNames = []

for f in os.listdir(os.getcwd()):
    if f.startswith('pcp'):
        filesNames.append(f)

for fileName in fileNames:
    f = open(fileName)
    #do what you need to.

Hope thats handy!

Also, If you need to open a lot of files concurrently, you can use some code like this:

import os
import sys

file_names = []

file_names.append("test1.txt")
file_names.append("test2.txt")

file_handles = []

# create a file handle for every file in file_names
for x in file_names:
    file_handles.append(open(x, 'r'))

# print first line from each file
for x in file_handles:
   print x.readline()

# clean up and close all file handles
for x in file_handles:
    close(x)

Thanks Paulthom and int3grate. Your suggestions helped in opening the files. I am not still able to write the content in the output file. It write the first file content only:

import os


fileNames = []

for f in os.listdir(os.getcwd()):
    if f.startswith('pcp'):
        fileNames.append(f)
        
ofh = open("pre5.txt", "w")

for fileName in fileNames:
    f = open(fileName)
    #do what you need to.

    for line in f.readlines():
    
        print >>ofh, line.rstrip(),
        
ofh.close()
f.close()

any suggestion?

I reproduced your code. Changed the deprecated (and imho fuzzy) file read and write syntax.
It works as expected.

I think you consume the content of f in "do what you need" phase.

Either you should read in the content of f (f.read()), manipulate it, and write it out, or you should manipulate the line variable in the for line in f loop on line.

import os

open("pcp1.txt","w").write("pcp11\npcp12") # testfile1
open("pcp2.txt","w").write("pcp21\npcp22") #testfile2

fileNames = []

for f in os.listdir(os.getcwd()):
    if f.startswith('pcp'):
        fileNames.append(f)
print(fileNames)

ofh = open("pre5.txt", "w")
for fileName in fileNames:
    f = open(fileName)
    #do what you need to.

    for line in f:
        ofh.write(line.strip())
    f.close()    

ofh.close()


# prints: ['pcp1.txt', 'pcp2.txt']
# pre5.txt is created and contains:pcp11pcp12pcp21pcp22

It might help if you could write (in your own words) what it is that you think you need the program to do.

The first example looks something like:

There are 10 (or N?) input files with line-oriented records.
Each input line contains 4 fields.
Generate an output file that aggregates the data from the input files.
Each output line will contain:

  • the first 3 fields from the line in the first file
  • the fourth field from each of the files

There will be one output line for each of the input lines.

My guess is that the first 3 fields identify when the sample was taken (date?) and the 4th (last?) field is the sample.

Does (or should) the program do anything to make sure that the collection date is the same in all of the files?

Would it be possible that one (or more) file(s) might not have data for a given date?

Having a better idea of what you're trying to accomplish will help us help you.

Hi Murtan,

You have got it right what I want the program to do. I have N files with 25 years daily rainfall data with the following format:

Year		Month	Date		value
1976		1		1		35.0
1976		1		2		123.0
.				.
1976		1		31		52.5
1976		2		1		143.8
1976		2		2		13.0
.		.		.		.
.		.		.		.
1990		1		1		52.5
.		.		.		.
.		.		.		.
.		.		.		.
2000		12		31		234.2

I want all the N files in one file with the following format:

Year	1976	1976	…	1990..2000	
Month	1	1		1	12
Date		1		2			31		31
Value1	35.0		123.0		54.3		43.4
Value2	99.5		45.7			87.5		23.7
.
.
.
N		56.8		234.0 …..		65.4	…..	243.6

All the input files have value for each day so the format of each files is the same and they have equal 4 fields and 9125 lines (i.e 25 years x 365 days). The program need to only copy the value field from each N files and put them in one output file. The copy and writing should be in a sequential form following the name of the input files (pcp_1.txt, pcp_2.txt …..).

Hope I am clear now!

chebude

in the previous post the output format looks a mess here it is how
the output format should look like:

I want all the N files in one file with the following format:

Year	1976	1976	…	1990	…….	2000
Month	1	1		1		12
Date	1	2		31		31
Value1	35.0	123.0             54.3		43.4
Value2	99.5	45.7		87.5		23.7
.
.
.
N	56.8	234.0 …..	              65.4	…..	243.6

So there will be 9126 columns of output (one extra for the headings)
and all of the values from one input file will be on one line in the output file?

Is the output file intended to be human readable or computer readable?

Human readable means you do things like try to line up decimal points and try to make the output 'pretty' when someone looks at it.

Computer readable means making it easier for the computer to extract the data.

So back to the algorithm

For the first file, we want to extract all of the fields, but can not print out the fields as we go unless we want to process through the file four times (not necessarily a bad idea, but could be inefficient).

The first line in the output file is the header followed by all of the first field values. The second output line is the header and the second field values. The third output line is the header and the third field values. The fourth line is the header and all of the fourth field values.

After the first file, the remaining files could be processed file-by-file and line-by-line:

For each remaining file
    Print the header
    for each line in the file
        print the fourth field value
    terminate the output line

Yes we will have one header from the first file and the rest would be the value fields. One input file will be one line in the output file.

It is meant to be human readable but I will further use it in my other model.

Well i did a bit of fiddling and came up with this, its not the most efficient code ever by a long shot but i think it should help:

import os


fileNames = []

for f in os.listdir(os.getcwd()):
    if f.startswith('pcp'):
        fileNames.append(f)
        
ofh = open("pre5.txt", "w")

for fileName in fileNames:
    f = open(fileName)
    #do what you need to.

    for line in f.readlines():
        if not line.startswith('Year'):
            lines.append(line)


#Doing the years
s = 'Year        '
for f in range(int(lines[0].split()[0]),
               len(lines)+int(lines[0].split()[0])):
    s += "      "+str(f)

ofh.write(s+'\n')

#the month collumn
s = 'Month        '
for f in lines:
    s += '      '+f.split()[1]

ofh.write(s+'\n')

#date
s = "Date       "
for f in lines:
    s += '      '+f.split()[2]

ofh.write(s+'\n')

#value
s = "Value        "
for f in lines:
    s += '        '+f.split()[3]

ofh.write(s+'\n')





        
ofh.close()
f.close()

Dear Paulthom,

I get this error message

line 18, in ?
lines.append(line)
AttributeError: 'str' object has no attribute 'append'

have no idea how to correct this. Sorry for my ignorance!

Sorry i didnt test it as i do not have the files. Try this:

import os


fileNames = []

for f in os.listdir(os.getcwd()):
    if f.startswith('pcp'):
        fileNames.append(f)
        
ofh = open("pre5.txt", "w")
lines = []
for fileName in fileNames:
    f = open(fileName)
    #do what you need to.

    for line in f.readlines():
        if not line.startswith('Year'):
            lines.append(line)


#Doing the years
s = 'Year        '
for f in range(int(lines[0].split()[0]),
               len(lines)+int(lines[0].split()[0])):
    s += "      "+str(f)

ofh.write(s+'\n')

#the month collumn
s = 'Month        '
for f in lines:
    s += '      '+f.split()[1]

ofh.write(s+'\n')

#date
s = "Date       "
for f in lines:
    s += '      '+f.split()[2]

ofh.write(s+'\n')

#value
s = "Value        "
for f in lines:
    s += '        '+f.split()[3]

ofh.write(s+'\n')





        
ofh.close()
f.close()

My prototype (really simple -- uses 3 copies of the sample data you posted) seems to work ok. I do parse the first input file 4 times (once for each of the fields I want). Then for each remaining file, I parse it for the 4th field and output the line.

The second parse of the first file looks like this:

ifh.seek(0,0)
ostr = ""
for line in ifh:
    ostr += "%-7s" % line.strip().split()[1]
ofh.write(ostr)
ofh.write("\n")

The value 7 was determined arbitrarily and is just used to make the data look pretty. Because you have 9126 columns and each column will take up 7 spaces, each line will be 63882 characters long (seems a bit much to me, but I think it will work).

Thank you guys it is working. I used Paulthom's program and it is working very fine.

I also tried Murtan's but it wites only the month repeatedly. probably I have missed something? Thanks a lot!!!!!!

This question has already been answered. Start a new discussion instead.