943,692 Members | Top Members by Rank

Ad:
  • Python Discussion Thread
  • Unsolved
  • Views: 1001
  • Python RSS
May 27th, 2009
0

Maximum array size and performance question

Expand Post »
Hey all,

I ran a code today which digested an input file which was 304 MB, consisting of about 10 million lines with six columns a piece. I ran the code and got an indexing error. In troubleshooting, I copied only the first 1,000,000 lines into a new data file, ran it and the code ran fine. Google searches say that python arrays are only limited by the ram of the machine.

My question is how can I get a sense of the limitations of my machine? The machine I'm running the code on has several 64bit processors and 8 gb's of ram. Is there an exact way (say a built in command?) that will allow me to test if a data file will be too large without requiring that I actually run the file and wait for it to error. Secondly, what would you recommend I do to obviate such a problem in the future? Lastly, is there a smart way to append the code so that if it fails, it will let me know exactly at which line it failed so I get a sense of how far it got before crashing?

Thanks
Similar Threads
Reputation Points: 16
Solved Threads: 4
Junior Poster
shoemoodoshaloo is offline Offline
133 posts
since May 2009
May 27th, 2009
0

Re: Maximum array size and performance question

I am running file processing with larger (>1G) files on a fare more weaker machine. I hardly believe your case ran into a limitation. In my experience the never ending program is more possible, than the one running out of memory.

Try catching the exception that occurs, and print out the line number. The other way is to write the out a "rejected records" file.

Based on my experience the problem will be a malformed record having less then six columns, maybe the last one or the header.

If possible try to write the program line oriented, ie do not read in the whole file.
Python Syntax (Toggle Plain Text)
  1. for line in open(fname):
  2. process line, write other files, do aggregation whatever

If everything else fails, and you firmly believe you reached some hard barrier, import it to an sqlite3 database. That can reach terrabytes...
Reputation Points: 56
Solved Threads: 65
Posting Whiz in Training
slate is offline Offline
242 posts
since Jun 2008
May 27th, 2009
0

Re: Maximum array size and performance question

What I don't understand is if the code was inherently flawed, why did it run a 1,000,000 line file perfectly fine.
Reputation Points: 16
Solved Threads: 4
Junior Poster
shoemoodoshaloo is offline Offline
133 posts
since May 2009
May 27th, 2009
0

Re: Maximum array size and performance question

Quote ...
s there a smart way to append the code so that if it fails, it will let me know exactly at which line it failed so I get a sense of how far it got before crashing?
Yes there is a smart way. Try-catch.

Python Syntax (Toggle Plain Text)
  1.  
  2. for count,line in enumerate(fileobject):
  3. try:
  4. do your stuff
  5. catch:
  6. print("Some error occured on line %s" % count)
  7. print("The bad line was:")
  8. print(line)
  9. raise

That will print out the bad line, the line number, and the stack. You will know exactly on which line the exception occured, what exception was that, on which line number this exception occured.
Reputation Points: 56
Solved Threads: 65
Posting Whiz in Training
slate is offline Offline
242 posts
since Jun 2008
May 27th, 2009
1

Re: Maximum array size and performance question

Quote ...
What I don't understand is if the code was inherently flawed, why did it run a 1,000,000 line file perfectly fine.
Well. If you have 10**6 lines with the structure of:
number;number;characters;number

Then, if the 10**6+1 th line contains a data like:
1;2;asd;jkle;3

Then your program will most likely crash.
Reputation Points: 56
Solved Threads: 65
Posting Whiz in Training
slate is offline Offline
242 posts
since Jun 2008
May 27th, 2009
0

Re: Maximum array size and performance question

Thanks for the help slate.
Reputation Points: 16
Solved Threads: 4
Junior Poster
shoemoodoshaloo is offline Offline
133 posts
since May 2009

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Python Forum Timeline: Python record functions.
Next Thread in Python Forum Timeline: wxPython help





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC