Creating arrays while processing big file

Question

eikonal 0 Newbie Poster

13 Years Ago

Hi everyone!
This is my first post even though Ive been reading you for a while.
I'm a Python beginner and I'd need your help.
I'm processing a very big file (more than 2 millions of lines) but I'll show you a much smaller example (24 lines rather than 74513). So let's say I 've got 24 lines, each one with a float number, after that 3 numbers on the same line, then again 24 lines, line with 3 numbers and son on for 29 times.

56.71739
56.67950
56.65762
56.63320
56.61648
56.60323
56.63215
56.74365
56.98378
57.34681
57.78903
58.27959
58.81514
59.38853
59.98271
60.58515
-1.00000
56.09566
56.05496
56.02777
56.00158
55.98341
55.96830
55.99615
1 1 1
56.34692
56.70977
57.15187
57.64234
58.17782
58.75118
59.34534
59.94779
-1.00000
55.47366
55.42963
55.39739
55.36958
55.35020
55.33404
55.36098
55.47148
55.71110
56.07384
56.51588
57.00632
57.54180
58.11517
58.70937
2 1 1

It's quite easy to create an array with the first 24 lines:

import numpy

def ttarray_tms (traveltimes):
    '''It defines the 3-D array, organized as I want.'''  
    with open (traveltimes, 'r') as file_in:
        newarray = file_in.readlines()
        ttarray = np.array(newarray)
        ttarray.shape = (2,3,4)
        ttarray = np.swapaxes(ttarray,1,2)     
        ttarray = np.swapaxes(ttarray,0,2)
        return ttarray

What I want is to basically get 29 arrays, so I should loop over the 24 lines and get an array, then loop again over the next 24 lines (jumping the line with 3 numbers, I don't really need them) and get another array and so on. I think my main problem is how to skip the line with the 3 numbers and start again a new loop for a new array.

Have you got any good idea?

Thanks very much!

python

3 Contributors
12 Replies
132 Views
18 Hours Discussion Span
Latest Post 13 Years Ago Latest Post by JoshuaBurleson

All 12 Replies

JoshuaBurleson 23 Posting Whiz

13 Years Ago

well think about how you would skip them...what is the easiest way to distinguish that particular piece of data from the others? To me it looks like maybe it's the fact that

data[0] is in

"not sure which it is" where data is the line

Edited 13 Years Ago by JoshuaBurleson because: n/a

woooee 814 Nearly a Posting Maven

13 Years Ago

strip() the line and then split() it and test the length. The ones you want should have a length of 2 if I understand correctly.

woooee 814 Nearly a Posting Maven

13 Years Ago

Post your code. In the code you posted earlier, you redefine newlines. It's too late here for me to test this, so you'll have to fix typos yourself.

for line in newarray:  # defined here
            newarray = line.strip().split()  # redefined here
            if len(line) > 2:
                break
            newarray = line.split()  # and a third time
#
#
       grouped = []
       this_group=[]
       for line in newarray:
            split_line = line.strip().split()
            if len(split_line) > 2:
                if len(this_group):
                    grouped.append(this_group)
                    print "appended", this_group
                this_group=[]
            else:
                this_group.append(line.strip())

        print grouped

Edited 13 Years Ago by woooee because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

eikonal 0 Newbie Poster · Answer 1 · 2011-10-17T06:59:37+00:00

well think about how you would skip them...what is the easiest way to distinguish that particular piece of data from the others? To me it looks like maybe it's the fact that
data[0] is in
"not sure which it is" where data is the line

For some reason my post doesn't show how the line with 3 numbers is organized.
First of all there's no blank line, moreover there's a tab (or some more whitespaces between them). A good way would be to use startswith ("") but it's not working.
See the code below;

def ttarray_tms (traveltimes):
    with open (traveltimes, 'r') as file_in:
        newarray = file_in.readlines()
        ttarray = []
        for line in newarray:
            ttarray = []
            if line.startswith (" "):
                break
            newarray = line.split()
            ttarray = np.array(newarray)
            ttarray.shape = (2,3,4)
            ttarray = np.swapaxes(ttarray,1,2)     
            ttarray = np.swapaxes(ttarray,0,2)
            return (ttarray)

What do you think?

JoshuaBurleson 23 Posting Whiz · Answer 2 · 2011-10-17T07:39:31+00:00

well I don't see a space or tab in any of the other kind. so what if we checked the contents of each line and checked for spaces or tabs?

eikonal 0 Newbie Poster · Answer 3 · 2011-10-17T07:39:55+00:00

strip() the line and then split() it and test the length. The ones you want should have a length of 2 if I understand correctly.

But then how can you create the arrays discriminating the 24 lines?

Do you mean to do something like:

if len(line) > 2:
                break

This is what I'm doing now but it gives me an empty array

def ttarray_tms (traveltimes): 
    with open (traveltimes, 'r') as file_in:
        newarray = file_in.readlines()
        ttarray = []
        for line in newarray:
            newarray = line.strip().split()
            if len(line) > 2:
                break
            newarray = line.split()
            ttarray = np.array(newarray)
            ttarray.shape = (2,3,4)
            ttarray = np.swapaxes(ttarray,1,2)     
            ttarray = np.swapaxes(ttarray,0,2)
        print (ttarray)

eikonal 0 Newbie Poster · Answer 4 · 2011-10-17T08:30:26+00:00

well I don't see a space or tab in any of the other kind. so what if we checked the contents of each line and checked for spaces or tabs?

I'm 100% sure there are no spaces or tabs in the 24 lines but there are in the line with the 3 numbers which separates the lines I need to build my arrays.
So I should say Python: when you get a whitespace stop, build the array with those lines and go ahead to build the next array with the next 24 lines. It doesn t do what I say :(

JoshuaBurleson 23 Posting Whiz · Answer 5 · 2011-10-17T09:28:35+00:00

when you get a whitespace stop, build the array with those lines and go ahead to build the next array with the next 24 lines.

so you want separate arrays instead of one large one?

eikonal 0 Newbie Poster · Answer 6 · 2011-10-17T09:34:24+00:00

so you want separate arrays instead of one large one?

That's right! I have to get 29 arrays because I have 28 lines separating the different series of 24 lines (I posted just a part of it before). And as you can see from my previous posts Im building, as a matter of fact, 2x3x4 arrays (24 values).

JoshuaBurleson 23 Posting Whiz · Answer 7 · 2011-10-17T10:01:09+00:00

or you could make an array of arrays "if it could suit your needs" with something like:

with open('help.txt') as f:
    linez=f.readlines()

aRray=[]
def make_list(lines,array):
    to_append=[]
    for line in lines:
        if ' ' in line:#I used this because of how the file is formatted for me, it this doesn't work for you maybe '\t' will, Idk
            lines.remove(line)
            array.append(to_append)
            make_list(lines,array)
        else:
            num=(line.strip('\n'))
            try:
                to_append.append(float(num))
            except ValueError:
                pass
            lines.remove(line)
make_list(linez,aRray)

print(aRray)

then the arrays in that large array could be accessed just as any other iterable: or you could make a dictionary of it with a similar process if you wished. And this function could easily be manipulated, I just didn't care what happened to linez after it was iterated through, maybe you do, idk.

eikonal 0 Newbie Poster · Answer 8 · 2011-10-17T10:52:13+00:00

or you could make an array of arrays "if it could suit your needs" with something like:
with open('help.txt') as f:
    linez=f.readlines()

aRray=[]
def make_list(lines,array):
    to_append=[]
    for line in lines:
        if ' ' in line:#I used this because of how the file is formatted for me, it this doesn't work for you maybe '\t' will, Idk
            lines.remove(line)
            array.append(to_append)
            make_list(lines,array)
        else:
            num=(line.strip('\n'))
            try:
                to_append.append(float(num))
            except ValueError:
                pass
            lines.remove(line)
make_list(linez,aRray)

print(aRray)
then the arrays in that large array could be accessed just as any other iterable: or you could make a dictionary of it with a similar process if you wished. And this function could easily be manipulated, I just didn't care what happened to linez after it was iterated through, maybe you do, idk.

It's giving me an empty list....

JoshuaBurleson 23 Posting Whiz · Answer 9 · 2011-10-17T19:27:25+00:00

It's giving me an empty list....

That's probably because, as I said, you need to reformat it to work for your file. I'll attach my file and show you the output:

output:

[[56.71739, 56.65762, 56.61648, 56.63215, 56.98378, 57.78903, 58.81514, 59.98271, -1.0, 56.05496, 56.00158, 55.9683, 56.70977, 57.64234, 58.75118, 59.94779, 55.47366, 55.39739, 55.3502, 55.36098, 55.7111, 56.51588, 57.5418, 58.70937], [56.6795, 56.60323, 57.34681, 59.38853, 56.09566, 55.98341, 56.07384]]

Creating arrays while processing big file

Recommended Answers Collapse Answers

All 12 Replies

Recommended Answers