I wrote a python program that gets input from an input file. It works fine for small input files since after opening the file I did this to get the data

data=fPtr.readlines()

Since readlines only takes the input data and packs it into a list, it is clear it won't work for large input files. The problem is that I need to extract all the data in the input file before I begin any operation.
I will be doing lots of looping in the program and I don't know whether opening and closing the file in a single loop would be efficient.
Please advise on the best option.

Happy times

Recommended Answers

All 7 Replies

So the question is:
How large is your data file?

A Python list will hold something like 2 trillion items, but is going to be pretty slow with a very large number or records in it. If your list is going to be 100 million records or more, then consider an SQLite database instead. If it's a paltry one million records (we now live in a gigabyte world), then there should not be a problem, but you might want to consider using a dictionary or set as they are both indexed via a hash and would be much faster on lookups.

I realized that a python list can hold as much data as the computer memory allows. On the python interpreter I gave this lines just to verify this and then on one console I gave the top command just to monitor the memory consumption of the python interpreter

li=[]
while True:
   li.append("king")

There was no error, the size of the list increased infinitely and hence the memory consumption of the python interpreter.
The reason I posted this question is that I thought it was a bug in a program I had submitted to some online judge who normally test a program with large input files.
Many thanks to all contributors!

As performance is concerned, the file read from disk will be the slowest part by far!

I am trying to open a big file (> 1 GB), but I am getting MemoryError.
The code is:
for line in open(data.txt,'r').readlines():

This line worked for me when the file size was around 750 MB, but giving error when the file size is greater than 1 GB.

Any remedy to this?
I dont want to read the file string or character wise... this will alter whole my code..

Thanks,
Mahesh

Does the code run without the readlines and how fast for 1 GB (compared to 750 MB before)?

i.e. for line in open(data.txt,'r'): Could you post main code, maybe we could optimize it together?

Usually it is best to use generator for huge data files.

I changed the code to:
for line in open(data.txt,'r'):
and it worked now.

For me, speed is not a concern.
Thanks for the help.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.