| | |
Maximum array size and performance question
Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
![]() |
•
•
Join Date: May 2009
Posts: 70
Reputation:
Solved Threads: 2
Hey all,
I ran a code today which digested an input file which was 304 MB, consisting of about 10 million lines with six columns a piece. I ran the code and got an indexing error. In troubleshooting, I copied only the first 1,000,000 lines into a new data file, ran it and the code ran fine. Google searches say that python arrays are only limited by the ram of the machine.
My question is how can I get a sense of the limitations of my machine? The machine I'm running the code on has several 64bit processors and 8 gb's of ram. Is there an exact way (say a built in command?) that will allow me to test if a data file will be too large without requiring that I actually run the file and wait for it to error. Secondly, what would you recommend I do to obviate such a problem in the future? Lastly, is there a smart way to append the code so that if it fails, it will let me know exactly at which line it failed so I get a sense of how far it got before crashing?
Thanks
I ran a code today which digested an input file which was 304 MB, consisting of about 10 million lines with six columns a piece. I ran the code and got an indexing error. In troubleshooting, I copied only the first 1,000,000 lines into a new data file, ran it and the code ran fine. Google searches say that python arrays are only limited by the ram of the machine.
My question is how can I get a sense of the limitations of my machine? The machine I'm running the code on has several 64bit processors and 8 gb's of ram. Is there an exact way (say a built in command?) that will allow me to test if a data file will be too large without requiring that I actually run the file and wait for it to error. Secondly, what would you recommend I do to obviate such a problem in the future? Lastly, is there a smart way to append the code so that if it fails, it will let me know exactly at which line it failed so I get a sense of how far it got before crashing?
Thanks
•
•
Join Date: Jun 2008
Posts: 127
Reputation:
Solved Threads: 31
I am running file processing with larger (>1G) files on a fare more weaker machine. I hardly believe your case ran into a limitation. In my experience the never ending program is more possible, than the one running out of memory.
Try catching the exception that occurs, and print out the line number. The other way is to write the out a "rejected records" file.
Based on my experience the problem will be a malformed record having less then six columns, maybe the last one or the header.
If possible try to write the program line oriented, ie do not read in the whole file.
If everything else fails, and you firmly believe you reached some hard barrier, import it to an sqlite3 database. That can reach terrabytes...
Try catching the exception that occurs, and print out the line number. The other way is to write the out a "rejected records" file.
Based on my experience the problem will be a malformed record having less then six columns, maybe the last one or the header.
If possible try to write the program line oriented, ie do not read in the whole file.
Python Syntax (Toggle Plain Text)
for line in open(fname): process line, write other files, do aggregation whatever
If everything else fails, and you firmly believe you reached some hard barrier, import it to an sqlite3 database. That can reach terrabytes...
•
•
Join Date: Jun 2008
Posts: 127
Reputation:
Solved Threads: 31
•
•
•
•
s there a smart way to append the code so that if it fails, it will let me know exactly at which line it failed so I get a sense of how far it got before crashing?
Python Syntax (Toggle Plain Text)
for count,line in enumerate(fileobject): try: do your stuff catch: print("Some error occured on line %s" % count) print("The bad line was:") print(line) raise
That will print out the bad line, the line number, and the stack. You will know exactly on which line the exception occured, what exception was that, on which line number this exception occured.
•
•
Join Date: Jun 2008
Posts: 127
Reputation:
Solved Threads: 31
•
•
•
•
What I don't understand is if the code was inherently flawed, why did it run a 1,000,000 line file perfectly fine.
number;number;characters;number
Then, if the 10**6+1 th line contains a data like:
1;2;asd;jkle;3
Then your program will most likely crash.
![]() |
Similar Threads
- maximum of an array using threads ?? (a shout for help) (C)
- Pascal code, displaying the maximum value in an array! (Pascal and Delphi)
- user given array size (C++)
- string array size (C++)
- works for static need help to make it dynamic (C++)
- Urgent Help Needed!!!! Plzzz (Java)
- Java practice questions (Java)
Other Threads in the Python Forum
- Previous Thread: Python record functions.
- Next Thread: wxPython help
| Thread Tools | Search this Thread |
alarm assignment avogadro beginner bluetooth character cmd code customdialog cx-freeze data decimals dictionary directory dynamic error examples exe file float format function generator gnu graphics gui halp homework http ideas import input itunes java leftmouse line linux list lists logging loop maintain maze module mouse mysqldb number numbers output parsing path port prime programming projects push py2exe pygame pyglet pyqt python queue random recursion schedule screensaverloopinactive script scrolledtext slicenotation sqlite ssh stdout string strings sudokusolver table terminal text thread threading time tkinter tlapse tuple tutorial ubuntu unicode urllib urllib2 variable variables ventrilo verify vigenere web webservice wikipedia windows wxpython xlib





