open many files

Question

sofia85 0 Junior Poster in Training

13 Years Ago

Hi,
I have a large file (11 GB), that I want to extract information from. I decided that the file was to big to work with, so I ended up with splitting it into 20 smaller files. Now I don't know what the smartest thing is to do with these files. Are they still to big? I want to open them and read them, but I was thinking that its to time demanding to do so that many times. I'm pretty new to python and now I'm not sure on how to proceed.

python

3 Contributors
5 Replies
148 Views
2 Days Discussion Span
Latest Post 13 Years Ago Latest Post by sofia85

All 5 Replies

TrustyTony 888 ex-Moderator

13 Years Ago

It depends what you want the file. Generally the way to work with large file is split and merge or you maybe do not need split if you work with generator expressions not loading all data in memory at once. So could you specify the processing you are doing to the data.

~s.o.s~ 2,560 Failure as a human

13 Years Ago

I have a large file (11 GB), that I want to extract information from.

At the risk of posting some off-topic, is using Python an absolute requirement? If not, and assuming you are on *nix, you can easily extract the probability using a one-liner:

cat tab.txt | sed '1d' | awk 'BEGIN{FS="\t"} {split($7,arr1,";"); split(arr1[3], arr2, "="); print arr2[2] }'

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

sofia85 0 Junior Poster in Training · Answer 1 · 2011-12-10T04:28:00+00:00

This is how the file looks like, it's a tab delimited file.

Bgr  Pro     ID         ff   Aa   FIL    Info                              
2   14370  AT6054257  3.54  5.67  PASS  NS=3;DP=14;prob=0.5;DB;H2

This is only one row from the file, but I have many lines and want to extract prob from column Info.

I know how to extract the data from the file, I just don't know how to do it when I have such a large file...

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 2 · 2011-12-10T06:32:07+00:00

If you are unsure about how much memory you need maybe you can separate the wanted part from file to another file:

def get_prob(line):
    before, prob, after = line.partition('prob=')
    if prob:
        return after.partition(';') [0]

with open('result.txt', 'w') as out_file, open('data_sofia.txt') as in_file:
    for line in in_file:
        prob = get_prob(line)
        if prob:
            out_file.write(prob+'\n')

The main program part can be expressed also with generator and writelines more concisely:

with open('result.txt', 'w') as out_file, open('data_sofia.txt') as in_file:
    out_file.writelines(prob+'\n' for prob in (get_prob(line) for line in in_file) if prob)

sofia85 0 Junior Poster in Training · Answer 3 · 2011-12-12T05:48:28+00:00

sofia85 0 Junior Poster in Training

13 Years Ago

thanks!

open many files

Recommended Answers Collapse Answers

All 5 Replies

Recommended Answers