So the question is:
How large is your data file?
vegaseat
DaniWeb's Hypocrite
5,989 posts since Oct 2004
Reputation Points: 1,345
Solved Threads: 1,417
A Python list will hold something like 2 trillion items, but is going to be pretty slow with a very large number or records in it. If your list is going to be 100 million records or more, then consider an SQLite database instead. If it's a paltry one million records (we now live in a gigabyte world), then there should not be a problem, but you might want to consider using a dictionary or set as they are both indexed via a hash and would be much faster on lookups.
woooee
Nearly a Posting Maven
2,454 posts since Dec 2006
Reputation Points: 777
Solved Threads: 714
As performance is concerned, the file read from disk will be the slowest part by far!
bumsfeld
Nearly a Posting Virtuoso
1,445 posts since Jul 2005
Reputation Points: 404
Solved Threads: 184
Does the code run without the readlines and how fast for 1 GB (compared to 750 MB before)?
i.e. for line in open(data.txt,'r'):
Could you post main code, maybe we could optimize it together?
Usually it is best to use generator for huge data files.
pyTony
pyMod
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852