Reading data from .asc files starting at a specific line

Question

AnnetteM 0 Light Poster

14 Years Ago

Hi,
I would like to read data from an .asc file with a six-line file header. I just want the data without the header. I'd like to start at the 7th line and go until the end collecting the data into a single list. Each different data file I'm looping through has a different number of rows. I started with 6 "readlines" for dumping and then a 'while' statement to read and append the lines until the end of the data set. However, this produces a list of length 2045. I was hoping it would be of the length of the number of data values in the dataset (or the number of rows times the number of columns as it is a data matrix). It seems the function float doesn't work for assignment from readline either. Might anyone have an ideas about a better way to go about it? I also attached an example of a data file.

Thank you.

Here's my current code:

import sys, string, os, arcgisscripting, copy, glob
from quantile import quantile

gq=[]
x=open(r'C:\PYTHON_SCRIPTS\GETSTATS\asc\darmein1.txt','r')
x.readline()
x.readline()
x.readline()
x.readline()
x.readline()
x.readline()
z= x.readline()
gq.append(z)
while z != '':
    z=x.readline()
    gq.append(z)
len(gq)### but len(gq)= 2045 when there are 2044 rows and 1422 columns in the original data set?? I was hoping for 2044*1422 strings (or data values) to turn into floats using...
float(gq)
###but float returns error: File "<interactive input>", line 1, in ####<module> TypeError: float() argument must be a string or a number

x.close()

python

This attachment is potentially unsafe to open. It may be an executable that is capable of making changes to your file system, or it may require specific software to open. Use caution and only open this attachment if you are comfortable working with zip files.

exampledataset.zip (138.48 KB)

3 Contributors
13 Replies
3K Views
6 Days Discussion Span
Latest Post 14 Years Ago Latest Post by Gribouillis

All 13 Replies

Gribouillis 1,391 Programming Explorer

14 Years Ago

This is just a hint

>>> s = "-9999 -9999 -9999 -9999 0.004537001 0.004537001 0.004537001   "
>>> s
'-9999 -9999 -9999 -9999 0.004537001 0.004537001 0.004537001   '
>>> s.strip()  # remove white space at both ends of s
'-9999 -9999 -9999 -9999 0.004537001 0.004537001 0.004537001'
>>> s.strip().split()   # split s on remaining white space
['-9999', '-9999', '-9999', '-9999', '0.004537001', '0.004537001', '0.004537001']
>>> [float(v) for v in s.strip().split()]
[-9999.0, -9999.0, -9999.0, -9999.0, 0.0045370009999999997, 0.0045370009999999997, 0.0045370009999999997]

Also, if you find too many lines, the number 1 debugging tool is to print your results and compare them to whatever you are expecting. If there are too many values, you can also print to a file.

Edited 14 Years Ago by Gribouillis because: n/a

AnnetteM commented: This post is very helpful +1

vegaseat 1,735 DaniWeb's Hypocrite

14 Years Ago

If you are interested in reading particular lines from a data file, look at module linecache.

AnnetteM commented: Thank you for the helpful post +1

Gribouillis 1,391 Programming Explorer

14 Years Ago

Why do you write x=open(file[0],'r') instead of x=open(file,'r') ?
What is the content of your runlist ?

2 remarks about the style: avoid using 'file' as a variable name because this is the name of a builtin type, and also the 6 x.readline() should be written with a loop: each time that you are repeating similar statements, there is a better way to do it.

About writing a datafile with the lists q1, q2, q3 there are many ways to do it, you can simply write

outfile = open("outfile.txt", "w")
for i in xrange(len(q1)):
    outfile.write("%12.3e%12.3e%12.3e\n" % (q1[i], q2[i], q3[i]))
outfile.close()

You could also output to an excel spreadsheet.

Edited 14 Years Ago by Gribouillis because: n/a

Gribouillis 1,391 Programming Explorer

14 Years Ago

It's really strange. Could you post the output of the following code (between code tags), I just added a few print statements

#READ DATA FROM EACH ASC FILE AND CALCULATE QUANTILES FROM EACH FILE

q1=[]
q2=[]
q3=[]
print repr(runlist)
print len(runlist)
for file in runlist:
    print repr(file)
    gq=[]
    x=open(file[0],'r')
    for i in xrange(6)
        x.readline()
    z= x.readline()
    while z != '':
        z=z.strip().split()
        for num in z:
            num=float(num)
            if num > -1:
                gq.append(num)
        z= x.readline()    
    a=quantile(gq, .25,  qtype = 7, issorted = False)
    #print a
    b=quantile(gq, .5,  qtype = 7, issorted = False)
    c=quantile(gq, .75,  qtype = 7, issorted = False)   
    q1.append(a)
    q2.append(b)
    q3.append(c)
print len(q1), len(q2), len(q3)

Edited 14 Years Ago by Gribouillis because: n/a

AnnetteM commented: Gribouillis is an incredible asset to Daniweb +1

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

AnnetteM 0 Light Poster · Answer 1 · 2010-02-22T22:54:53+00:00

Thank you for this. I am used to higher level languages like Matlab and R where this white space stripping is not required.

This is just a hint

>>> s = "-9999 -9999 -9999 -9999 0.004537001 0.004537001 0.004537001   "
>>> s
'-9999 -9999 -9999 -9999 0.004537001 0.004537001 0.004537001   '
>>> s.strip()  # remove white space at both ends of s
'-9999 -9999 -9999 -9999 0.004537001 0.004537001 0.004537001'
>>> s.strip().split()   # split s on remaining white space
['-9999', '-9999', '-9999', '-9999', '0.004537001', '0.004537001', '0.004537001']
>>> [float(v) for v in s.strip().split()]
[-9999.0, -9999.0, -9999.0, -9999.0, 0.0045370009999999997, 0.0045370009999999997, 0.0045370009999999997]

Also, if you find too many lines, the number 1 debugging tool is to print your results and compare them to whatever you are expecting. If there are too many values, you can also print to a file.

AnnetteM 0 Light Poster · Answer 2 · 2010-02-22T22:55:29+00:00

Okay, thank you vegaseat!

If you are interested in reading particular lines from a data file, look at module linecache.

AnnetteM 0 Light Poster · Answer 3 · 2010-02-25T01:28:12+00:00

Okay, this is what I ended up with. However, I have two new challenges. For some reason, the 'q1.append(a)' along with the other q.appends are not showing that the program loops through each file in the runlist. I printed the runlist, and it is fine. I can't figure out what the loop problem is. I also am curious how to write a data file that would concatenate the three different 1-D arrays (or q1,q2,and q3) once I get the qs appending properly. (Hints or leads are all thats necessary.) I'm working with python 2.5 so I don't have access to scipy or numpy. Thank you so much Daniweb!

#READ DATA FROM EACH ASC FILE AND CALCULATE QUANTILES FROM EACH FILE

q1=[]
q2=[]
q3=[]
for file in runlist:
    gq=[]
    x=open(file[0],'r')
    x.readline()
    x.readline()
    x.readline()
    x.readline()
    x.readline()
    x.readline()
    z= x.readline()
    while z != '':
        z=z.strip()
        z=z.strip().split()
        for num in z:
            num=float(num)
            if num > -1:
                gq.append(num)
        z= x.readline()    
    a=quantile(gq, .25,  qtype = 7, issorted = False)
    print a
    b=quantile(gq, .5,  qtype = 7, issorted = False)
    c=quantile(gq, .75,  qtype = 7, issorted = False)   
    q1.append(a)
    q2.append(b)
    q3.append(c)
print q1

AnnetteM 0 Light Poster · Answer 4 · 2010-02-25T04:34:53+00:00

Here a snippet of the runlist: 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\dtshein.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\dtshein.txt', 'C:\\PYTHON_SCRIPTS\\GETSTAS\\asc\\dtshein8.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\dtshein9.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\dtshenn1.txt',

I have wondered this very thing about file[0] myself. When I take file[0] out and replace with just file and run it, I get this error:
Traceback (most recent call last):
File "C:/PYTHON_SCRIPTS/readASCfile3.py", line 42, in <module>
x=open(file,'r')
TypeError: coercing to Unicode: need string or buffer, list found

I also changed the word file to ascfile and get the same error (above) if it doesn't have that zero starting index.

I generally index the iterating object (like file or ascfile) in other loops with "object[0]", and the whole loop runs (for more than just the 0th iteration). Hence, I thought this is how you initialize indexing of the object in the loop.(?) I picked it up from someone else's python code posted on the internet.

For example, this loop works successfully to retrieve 9 different basenames in a loop:
for tiffile in runlist:
basename0=os.path.basename(tiffile[0])

Should the indexing be different?

Why do you write x=open(file[0],'r') instead of x=open(file,'r') ?
What is the content of your runlist ?
2 remarks about the style: avoid using 'file' as a variable name because this is the name of a builtin type, and also the 6 x.readline() should be written with a loop: each time that you are repeating similar statements, there is a better way to do it.
About writing a datafile with the lists q1, q2, q3 there are many ways to do it, you can simply write
outfile = open("outfile.txt", "w")
for i in xrange(len(q1)):
    outfile.write("%12.3e%12.3e%12.3e\n" % (q1[i], q2[i], q3[i]))
outfile.close()
You could also output to an excel spreadsheet.

AnnetteM 0 Light Poster · Answer 5 · 2010-02-25T08:18:23+00:00

Wow, for some reason, the runlist is only length=1. It seems I was wrong with the runlist being okay. Here's the code I use to generate the list, I think glob.glob is not the best choice of functions to generate the list.

runlist = []
if glob.glob(ascDIR+"\\*.txt") <> []: #Find all asc files
        runlist.append(glob.glob(ascDIR+"\\*.txt")) # Add them to the run list
print runlist

OUPUT

>>> 
[['C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein1.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein2.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein3.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein4.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein5.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein6.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein7.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein8.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein9.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn1.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn2.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn3.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn4.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn5.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn6.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn7.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn8.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn9.txt']]
[['C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein1.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein2.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein3.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein4.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein5.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein6.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein7.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein8.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein9.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn1.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn2.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn3.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn4.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn5.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn6.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn7.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn8.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn9.txt']]
1
['C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein1.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein2.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein3.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein4.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein5.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein6.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein7.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein8.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmein9.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn1.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn2.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn3.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn4.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn5.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn6.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn7.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn8.txt', 'C:\\PYTHON_SCRIPTS\\GETSTATS\\asc\\darmenn9.txt']
1 1 1

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 6 · 2010-02-25T08:24:12+00:00

Your runlist is a list containing another list ! That's because glob.glob() returned a list and you appended this list to the runlist. You can replace 'append' with 'extend'

runlist = []
if glob.glob(ascDIR+"\\*.txt") <> []: #Find all asc files
        runlist.extend(glob.glob(ascDIR+"\\*.txt")) # Add them to the run list
print runlist

Also, write open(file, 'r') and not file[0]. It should work now.

AnnetteM 0 Light Poster · Answer 7 · 2010-02-26T22:48:41+00:00

Okay,
I now retrieved the runlist using a function form the module os.

os.chdir(ascDIR)
runlist=os.listdir(ascDIR)

I also changed the file[0] to just file for the loop and the code is in running order! Thank you.

AnnetteM 0 Light Poster · Answer 8 · 2010-02-26T22:51:53+00:00

Oops, I didn't see your extend comment until now! I can do that too, thank you!!!
You win the gold medal for aerial programming help!!!!!!!

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 9 · 2010-02-27T00:43:35+00:00

Gribouillis 1,391 Programming Explorer

14 Years Ago

thanks :)

Reading data from .asc files starting at a specific line

Recommended Answers Collapse Answers

All 13 Replies

Recommended Answers