Hi,
I would like to read data from an .asc file with a six-line file header. I just want the data without the header. I'd like to start at the 7th line and go until the end collecting the data into a single list. Each different data file I'm looping through has a different number of rows. I started with 6 "readlines" for dumping and then a 'while' statement to read and append the lines until the end of the data set. However, this produces a list of length 2045. I was hoping it would be of the length of the number of data values in the dataset (or the number of rows times the number of columns as it is a data matrix). It seems the function float doesn't work for assignment from readline either. Might anyone have an ideas about a better way to go about it? I also attached an example of a data file.

Thank you.

Here's my current code:

import sys, string, os, arcgisscripting, copy, glob
from quantile import quantile

gq=[]
x=open(r'C:\PYTHON_SCRIPTS\GETSTATS\asc\darmein1.txt','r')
x.readline()
x.readline()
x.readline()
x.readline()
x.readline()
x.readline()
z= x.readline()
gq.append(z)
while z != '':
    z=x.readline()
    gq.append(z)
len(gq)### but len(gq)= 2045 when there are 2044 rows and 1422 columns in the original data set?? I was hoping for 2044*1422 strings (or data values) to turn into floats using...
float(gq)
###but float returns error: File "<interactive input>", line 1, in ####<module> TypeError: float() argument must be a string or a number

x.close()

This is just a hint

>>> s = "-9999 -9999 -9999 -9999 0.004537001 0.004537001 0.004537001   "
>>> s
'-9999 -9999 -9999 -9999 0.004537001 0.004537001 0.004537001   '
>>> s.strip()  # remove white space at both ends of s
'-9999 -9999 -9999 -9999 0.004537001 0.004537001 0.004537001'
>>> s.strip().split()   # split s on remaining white space
['-9999', '-9999', '-9999', '-9999', '0.004537001', '0.004537001', '0.004537001']
>>> [float(v) for v in s.strip().split()]
[-9999.0, -9999.0, -9999.0, -9999.0, 0.0045370009999999997, 0.0045370009999999997, 0.0045370009999999997]

Also, if you find too many lines, the number 1 debugging tool is to print your results and compare them to whatever you are expecting. If there are too many values, you can also print to a file.

Edited 6 Years Ago by Gribouillis: n/a

Comments
This post is very helpful

If you are interested in reading particular lines from a data file, look at module linecache.

Comments
Thank you for the helpful post

Thank you for this. I am used to higher level languages like Matlab and R where this white space stripping is not required.


This is just a hint

>>> s = "-9999 -9999 -9999 -9999 0.004537001 0.004537001 0.004537001   "
>>> s
'-9999 -9999 -9999 -9999 0.004537001 0.004537001 0.004537001   '
>>> s.strip()  # remove white space at both ends of s
'-9999 -9999 -9999 -9999 0.004537001 0.004537001 0.004537001'
>>> s.strip().split()   # split s on remaining white space
['-9999', '-9999', '-9999', '-9999', '0.004537001', '0.004537001', '0.004537001']
>>> [float(v) for v in s.strip().split()]
[-9999.0, -9999.0, -9999.0, -9999.0, 0.0045370009999999997, 0.0045370009999999997, 0.0045370009999999997]

Also, if you find too many lines, the number 1 debugging tool is to print your results and compare them to whatever you are expecting. If there are too many values, you can also print to a file.

Okay, thank you vegaseat!

If you are interested in reading particular lines from a data file, look at module linecache.

Okay, this is what I ended up with. However, I have two new challenges. For some reason, the 'q1.append(a)' along with the other q.appends are not showing that the program loops through each file in the runlist. I printed the runlist, and it is fine. I can't figure out what the loop problem is. I also am curious how to write a data file that would concatenate the three different 1-D arrays (or q1,q2,and q3) once I get the qs appending properly. (Hints or leads are all thats necessary.) I'm working with python 2.5 so I don't have access to scipy or numpy. Thank you so much Daniweb!

#READ DATA FROM EACH ASC FILE AND CALCULATE QUANTILES FROM EACH FILE

q1=[]
q2=[]
q3=[]
for file in runlist:
    gq=[]
    x=open(file[0],'r')
    x.readline()
    x.readline()
    x.readline()
    x.readline()
    x.readline()
    x.readline()
    z= x.readline()
    while z != '':
        z=z.strip()
        z=z.strip().split()
        for num in z:
            num=float(num)
            if num > -1:
                gq.append(num)
        z= x.readline()    
    a=quantile(gq, .25,  qtype = 7, issorted = False)
    print a
    b=quantile(gq, .5,  qtype = 7, issorted = False)
    c=quantile(gq, .75,  qtype = 7, issorted = False)   
    q1.append(a)
    q2.append(b)
    q3.append(c)
print q1

Why do you write x=open(file[0],'r') instead of x=open(file,'r') ?
What is the content of your runlist ?

2 remarks about the style: avoid using 'file' as a variable name because this is the name of a builtin type, and also the 6 x.readline() should be written with a loop: each time that you are repeating similar statements, there is a better way to do it.

About writing a datafile with the lists q1, q2, q3 there are many ways to do it, you can simply write

outfile = open("outfile.txt", "w")
for i in xrange(len(q1)):
    outfile.write("%12.3e%12.3e%12.3e\n" % (q1[i], q2[i], q3[i]))
outfile.close()

You could also output to an excel spreadsheet.

Edited 6 Years Ago by Gribouillis: n/a

Here a snippet of the runlist: 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\dtshein.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\dtshein.txt', 'C:\\PYTHON_SCRIPTS\\GETSTAS\\asc\\dtshein8.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\dtshein9.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\dtshenn1.txt',

I have wondered this very thing about file[0] myself. When I take file[0] out and replace with just file and run it, I get this error:
Traceback (most recent call last):
File "C:/PYTHON_SCRIPTS/readASCfile3.py", line 42, in <module>
x=open(file,'r')
TypeError: coercing to Unicode: need string or buffer, list found

I also changed the word file to ascfile and get the same error (above) if it doesn't have that zero starting index.

I generally index the iterating object (like file or ascfile) in other loops with "object[0]", and the whole loop runs (for more than just the 0th iteration). Hence, I thought this is how you initialize indexing of the object in the loop.(?) I picked it up from someone else's python code posted on the internet.

For example, this loop works successfully to retrieve 9 different basenames in a loop:
for tiffile in runlist:
basename0=os.path.basename(tiffile[0])

Should the indexing be different?


Why do you write x=open(file[0],'r') instead of x=open(file,'r') ?
What is the content of your runlist ?

2 remarks about the style: avoid using 'file' as a variable name because this is the name of a builtin type, and also the 6 x.readline() should be written with a loop: each time that you are repeating similar statements, there is a better way to do it.

About writing a datafile with the lists q1, q2, q3 there are many ways to do it, you can simply write

outfile = open("outfile.txt", "w")
for i in xrange(len(q1)):
    outfile.write("%12.3e%12.3e%12.3e\n" % (q1[i], q2[i], q3[i]))
outfile.close()

You could also output to an excel spreadsheet.

It's really strange. Could you post the output of the following code (between code tags), I just added a few print statements

#READ DATA FROM EACH ASC FILE AND CALCULATE QUANTILES FROM EACH FILE

q1=[]
q2=[]
q3=[]
print repr(runlist)
print len(runlist)
for file in runlist:
    print repr(file)
    gq=[]
    x=open(file[0],'r')
    for i in xrange(6)
        x.readline()
    z= x.readline()
    while z != '':
        z=z.strip().split()
        for num in z:
            num=float(num)
            if num > -1:
                gq.append(num)
        z= x.readline()    
    a=quantile(gq, .25,  qtype = 7, issorted = False)
    #print a
    b=quantile(gq, .5,  qtype = 7, issorted = False)
    c=quantile(gq, .75,  qtype = 7, issorted = False)   
    q1.append(a)
    q2.append(b)
    q3.append(c)
print len(q1), len(q2), len(q3)

Edited 6 Years Ago by Gribouillis: n/a

Comments
Gribouillis is an incredible asset to Daniweb

Wow, for some reason, the runlist is only length=1. It seems I was wrong with the runlist being okay. Here's the code I use to generate the list, I think glob.glob is not the best choice of functions to generate the list.

runlist = []
if glob.glob(ascDIR+"\\*.txt") <> []: #Find all asc files
        runlist.append(glob.glob(ascDIR+"\\*.txt")) # Add them to the run list
print runlist

OUPUT

>>> 
[['C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein1.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein2.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein3.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein4.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein5.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein6.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein7.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein8.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein9.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn1.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn2.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn3.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn4.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn5.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn6.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn7.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn8.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn9.txt']]
[['C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein1.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein2.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein3.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein4.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein5.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein6.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein7.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein8.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein9.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn1.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn2.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn3.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn4.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn5.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn6.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn7.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn8.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn9.txt']]
1
['C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein1.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein2.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein3.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein4.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein5.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein6.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein7.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein8.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein9.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn1.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn2.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn3.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn4.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn5.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn6.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn7.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn8.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn9.txt']
1 1 1

Your runlist is a list containing another list ! That's because glob.glob() returned a list and you appended this list to the runlist. You can replace 'append' with 'extend'

runlist = []
if glob.glob(ascDIR+"\\*.txt") <> []: #Find all asc files
        runlist.extend(glob.glob(ascDIR+"\\*.txt")) # Add them to the run list
print runlist

Also, write open(file, 'r') and not file[0]. It should work now.

Okay,
I now retrieved the runlist using a function form the module os.

os.chdir(ascDIR)
runlist=os.listdir(ascDIR)

I also changed the file[0] to just file for the loop and the code is in running order! Thank you.

Oops, I didn't see your extend comment until now! I can do that too, thank you!!!
You win the gold medal for aerial programming help!!!!!!!

This question has already been answered. Start a new discussion instead.