Large lists in python

Question

fekioh 0 Newbie Poster

14 Years Ago

Hello,

I need to store data in large lists (~e7 elements) and I often get a memory error in code that looks like:

f = open('data.txt','r')
 for line in f:
     list1.append(line.split(',')[1])
     list2.append(line.split(',')[2])
     # etc.

I get the error when reading-in the data, but I don't really need all elements to be stored in RAM all the time. I work with chunks of that data.

So, more specifically, I have to read-in ~ 10,000,000 entries (strings and numeric) from 15 different columns in a text file, store them in list-like objects, do some element-wise calculations and get summary statistics (means, stdevs etc.) for blocks of say 500,000. Fast access for these blocks would be needed!

I need to read everything in at once (so no f.seek() etc. to read the data a block at a time). So I'm looking for any alternative list implementation (or other list-like data structure) with which I could read all the data, store it on disk, and load in RAM a chunk/"page" of it at a time.

Any advice on how to achieve this? Platform = windowsXP

Cheers!

data-structure python

5 Contributors
4 Replies
2K Views
7 Hours Discussion Span
Latest Post 14 Years Ago Latest Post by ultimatebuster

All 4 Replies

joehms22 18 Junior Poster

14 Years Ago

Try pickling. You can store large amounts of data in (almost) native python formats to the disk, then re-load them quickly later.

The pickle module implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream is converted back into an object hierarchy.

http://docs.python.org/library/pickle.html

Hope it helps:
-Joe

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

vegaseat 1,735 DaniWeb's Hypocrite Team Colleague · Answer 1 · 2010-08-14T23:54:10+00:00

You may want to look into a relatively new (Python 2.6.5 +) container called namedtuple in module collections. It uses about as much memory as a regular tuple.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 2 · 2010-08-14T23:56:13+00:00

Additionally you can use blist, a faster implementation of lists for big data sets:

The blist is a drop-in replacement for the Python list the provides better performance when modifying large lists. The blist package also provides sortedlist, sortedset, weaksortedlist, weaksortedset, sorteddict, and btuple types.

ultimatebuster 14 Posting Whiz in Training · Answer 3 · 2010-08-15T00:31:34+00:00

I'm not sure on how fast/efficient this would be, but you can write a custom object that uses the __setattr__, __getattr__ and __delattr__. It would read your file and generate lists and convert them into tuple every 10/20/30/40... elements, and store them with a numeric value.

Again I'm not sure on the speed/efficiency of this algorithm. IT's just an idea.

Large lists in python

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers