Maximum array size and performance question

Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
Reply

Join Date: May 2009
Posts: 70
Reputation: shoemoodoshaloo is an unknown quantity at this point 
Solved Threads: 2
shoemoodoshaloo shoemoodoshaloo is offline Offline
Junior Poster in Training

Maximum array size and performance question

 
0
  #1
May 27th, 2009
Hey all,

I ran a code today which digested an input file which was 304 MB, consisting of about 10 million lines with six columns a piece. I ran the code and got an indexing error. In troubleshooting, I copied only the first 1,000,000 lines into a new data file, ran it and the code ran fine. Google searches say that python arrays are only limited by the ram of the machine.

My question is how can I get a sense of the limitations of my machine? The machine I'm running the code on has several 64bit processors and 8 gb's of ram. Is there an exact way (say a built in command?) that will allow me to test if a data file will be too large without requiring that I actually run the file and wait for it to error. Secondly, what would you recommend I do to obviate such a problem in the future? Lastly, is there a smart way to append the code so that if it fails, it will let me know exactly at which line it failed so I get a sense of how far it got before crashing?

Thanks
Reply With Quote Quick reply to this message  
Join Date: Jun 2008
Posts: 128
Reputation: slate is an unknown quantity at this point 
Solved Threads: 31
slate slate is offline Offline
Junior Poster

Re: Maximum array size and performance question

 
0
  #2
May 27th, 2009
I am running file processing with larger (>1G) files on a fare more weaker machine. I hardly believe your case ran into a limitation. In my experience the never ending program is more possible, than the one running out of memory.

Try catching the exception that occurs, and print out the line number. The other way is to write the out a "rejected records" file.

Based on my experience the problem will be a malformed record having less then six columns, maybe the last one or the header.

If possible try to write the program line oriented, ie do not read in the whole file.
  1. for line in open(fname):
  2. process line, write other files, do aggregation whatever

If everything else fails, and you firmly believe you reached some hard barrier, import it to an sqlite3 database. That can reach terrabytes...
Reply With Quote Quick reply to this message  
Join Date: May 2009
Posts: 70
Reputation: shoemoodoshaloo is an unknown quantity at this point 
Solved Threads: 2
shoemoodoshaloo shoemoodoshaloo is offline Offline
Junior Poster in Training

Re: Maximum array size and performance question

 
0
  #3
May 27th, 2009
What I don't understand is if the code was inherently flawed, why did it run a 1,000,000 line file perfectly fine.
Reply With Quote Quick reply to this message  
Join Date: Jun 2008
Posts: 128
Reputation: slate is an unknown quantity at this point 
Solved Threads: 31
slate slate is offline Offline
Junior Poster

Re: Maximum array size and performance question

 
0
  #4
May 27th, 2009
s there a smart way to append the code so that if it fails, it will let me know exactly at which line it failed so I get a sense of how far it got before crashing?
Yes there is a smart way. Try-catch.

  1.  
  2. for count,line in enumerate(fileobject):
  3. try:
  4. do your stuff
  5. catch:
  6. print("Some error occured on line %s" % count)
  7. print("The bad line was:")
  8. print(line)
  9. raise

That will print out the bad line, the line number, and the stack. You will know exactly on which line the exception occured, what exception was that, on which line number this exception occured.
Reply With Quote Quick reply to this message  
Join Date: Jun 2008
Posts: 128
Reputation: slate is an unknown quantity at this point 
Solved Threads: 31
slate slate is offline Offline
Junior Poster

Re: Maximum array size and performance question

 
1
  #5
May 27th, 2009
What I don't understand is if the code was inherently flawed, why did it run a 1,000,000 line file perfectly fine.
Well. If you have 10**6 lines with the structure of:
number;number;characters;number

Then, if the 10**6+1 th line contains a data like:
1;2;asd;jkle;3

Then your program will most likely crash.
Reply With Quote Quick reply to this message  
Join Date: May 2009
Posts: 70
Reputation: shoemoodoshaloo is an unknown quantity at this point 
Solved Threads: 2
shoemoodoshaloo shoemoodoshaloo is offline Offline
Junior Poster in Training

Re: Maximum array size and performance question

 
0
  #6
May 27th, 2009
Thanks for the help slate.
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:




Views: 358 | Replies: 5
Thread Tools Search this Thread



Tag cloud for Python
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC