Read large input file to memory

Question

sureronald 0 Junior Poster

15 Years Ago

I wrote a python program that gets input from an input file. It works fine for small input files since after opening the file I did this to get the data

data=fPtr.readlines()

Since readlines only takes the input data and packs it into a list, it is clear it won't work for large input files. The problem is that I need to extract all the data in the input file before I begin any operation.
I will be doing lots of looping in the program and I don't know whether opening and closing the file in a single loop would be efficient.
Please advise on the best option.

Happy times

file-system python

6 Contributors
7 Replies
4K Views
8 Months Discussion Span
Latest Post 15 Years Ago Latest Post by mahesham

All 7 Replies

vegaseat 1,735 DaniWeb's Hypocrite

15 Years Ago

So the question is:
How large is your data file?

woooee 814 Nearly a Posting Maven

15 Years Ago

A Python list will hold something like 2 trillion items, but is going to be pretty slow with a very large number or records in it. If your list is going to be 100 million records or more, then consider an SQLite database instead. If it's a paltry one million records (we now live in a gigabyte world), then there should not be a problem, but you might want to consider using a dictionary or set as they are both indexed via a hash and would be much faster on lookups.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

sureronald 0 Junior Poster · Answer 1 · 2009-10-12T21:05:57+00:00

I realized that a python list can hold as much data as the computer memory allows. On the python interpreter I gave this lines just to verify this and then on one console I gave the top command just to monitor the memory consumption of the python interpreter

li=[]
while True:
   li.append("king")

There was no error, the size of the list increased infinitely and hence the memory consumption of the python interpreter.
The reason I posted this question is that I thought it was a bug in a program I had submitted to some online judge who normally test a program with large input files.
Many thanks to all contributors!

bumsfeld 413 Nearly a Posting Virtuoso · Answer 2 · 2009-10-13T01:15:40+00:00

As performance is concerned, the file read from disk will be the slowest part by far!

mahesham 0 Newbie Poster · Answer 3 · 2010-06-22T14:44:10+00:00

I am trying to open a big file (> 1 GB), but I am getting MemoryError.
The code is:
for line in open(data.txt,'r').readlines():

This line worked for me when the file size was around 750 MB, but giving error when the file size is greater than 1 GB.

Any remedy to this?
I dont want to read the file string or character wise... this will alter whole my code..

Thanks,
Mahesh

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 4 · 2010-06-22T15:04:32+00:00

Does the code run without the readlines and how fast for 1 GB (compared to 750 MB before)?

i.e. for line in open(data.txt,'r'): Could you post main code, maybe we could optimize it together?

Usually it is best to use generator for huge data files.

mahesham 0 Newbie Poster · Answer 5 · 2010-06-22T15:52:04+00:00

I changed the code to:
for line in open(data.txt,'r'):
and it worked now.

For me, speed is not a concern.
Thanks for the help.

Read large input file to memory

Recommended Answers Collapse Answers

All 7 Replies

Recommended Answers