944,082 Members | Top Members by Rank

Ad:
  • Python Discussion Thread
  • Marked Solved
  • Views: 1629
  • Python RSS
Oct 10th, 2009
0

Read large input file to memory

Expand Post »
I wrote a python program that gets input from an input file. It works fine for small input files since after opening the file I did this to get the data
python Syntax (Toggle Plain Text)
  1. data=fPtr.readlines()
Since readlines only takes the input data and packs it into a list, it is clear it won't work for large input files. The problem is that I need to extract all the data in the input file before I begin any operation.
I will be doing lots of looping in the program and I don't know whether opening and closing the file in a single loop would be efficient.
Please advise on the best option.

Happy times
Similar Threads
Reputation Points: 11
Solved Threads: 18
Junior Poster
sureronald is offline Offline
139 posts
since May 2008
Oct 10th, 2009
0
Re: Read large input file to memory
So the question is:
How large is your data file?
Moderator
Reputation Points: 1333
Solved Threads: 1403
DaniWeb's Hypocrite
vegaseat is offline Offline
5,792 posts
since Oct 2004
Oct 10th, 2009
0
Re: Read large input file to memory
A Python list will hold something like 2 trillion items, but is going to be pretty slow with a very large number or records in it. If your list is going to be 100 million records or more, then consider an SQLite database instead. If it's a paltry one million records (we now live in a gigabyte world), then there should not be a problem, but you might want to consider using a dictionary or set as they are both indexed via a hash and would be much faster on lookups.
Reputation Points: 741
Solved Threads: 692
Nearly a Posting Maven
woooee is online now Online
2,307 posts
since Dec 2006
Oct 12th, 2009
0
Re: Read large input file to memory
I realized that a python list can hold as much data as the computer memory allows. On the python interpreter I gave this lines just to verify this and then on one console I gave the top command just to monitor the memory consumption of the python interpreter
python Syntax (Toggle Plain Text)
  1. li=[]
  2. while True:
  3. li.append("king")
There was no error, the size of the list increased infinitely and hence the memory consumption of the python interpreter.
The reason I posted this question is that I thought it was a bug in a program I had submitted to some online judge who normally test a program with large input files.
Many thanks to all contributors!
Reputation Points: 11
Solved Threads: 18
Junior Poster
sureronald is offline Offline
139 posts
since May 2008
Oct 12th, 2009
0
Re: Read large input file to memory
As performance is concerned, the file read from disk will be the slowest part by far!
Reputation Points: 404
Solved Threads: 180
Nearly a Posting Virtuoso
bumsfeld is offline Offline
1,422 posts
since Jul 2005
Jun 22nd, 2010
0
Re: Read large input file to memory
I am trying to open a big file (> 1 GB), but I am getting MemoryError.
The code is:
for line in open(data.txt,'r').readlines():

This line worked for me when the file size was around 750 MB, but giving error when the file size is greater than 1 GB.

Any remedy to this?
I dont want to read the file string or character wise... this will alter whole my code..

Thanks,
Mahesh
Reputation Points: 10
Solved Threads: 0
Newbie Poster
mahesham is offline Offline
6 posts
since May 2010
Jun 22nd, 2010
0
Re: Read large input file to memory
Does the code run without the readlines and how fast for 1 GB (compared to 750 MB before)?

i.e. for line in open(data.txt,'r'):
Could you post main code, maybe we could optimize it together?

Usually it is best to use generator for huge data files.
Last edited by pyTony; Jun 22nd, 2010 at 6:05 am.
Featured Poster
Reputation Points: 687
Solved Threads: 748
Industrious Poster
pyTony is offline Offline
4,208 posts
since Apr 2010
Jun 22nd, 2010
0
Re: Read large input file to memory
I changed the code to:
for line in open(data.txt,'r'):
and it worked now.

For me, speed is not a concern.
Thanks for the help.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
mahesham is offline Offline
6 posts
since May 2010

This thread is solved

Either the thread starter or a moderator has marked this thread as solved. You can most likely trust the responses and answers given. There is most likely no reason for any further responses to be posted here. If you have a related question, please start a new thread in this forum instead.

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Python Forum Timeline: For loop and graphics.py
Next Thread in Python Forum Timeline: unable to run c utility from Python script





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC