| | |
Maximum array size and performance question
Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
![]() |
•
•
Join Date: May 2009
Posts: 70
Reputation:
Solved Threads: 2
Hey all,
I ran a code today which digested an input file which was 304 MB, consisting of about 10 million lines with six columns a piece. I ran the code and got an indexing error. In troubleshooting, I copied only the first 1,000,000 lines into a new data file, ran it and the code ran fine. Google searches say that python arrays are only limited by the ram of the machine.
My question is how can I get a sense of the limitations of my machine? The machine I'm running the code on has several 64bit processors and 8 gb's of ram. Is there an exact way (say a built in command?) that will allow me to test if a data file will be too large without requiring that I actually run the file and wait for it to error. Secondly, what would you recommend I do to obviate such a problem in the future? Lastly, is there a smart way to append the code so that if it fails, it will let me know exactly at which line it failed so I get a sense of how far it got before crashing?
Thanks
I ran a code today which digested an input file which was 304 MB, consisting of about 10 million lines with six columns a piece. I ran the code and got an indexing error. In troubleshooting, I copied only the first 1,000,000 lines into a new data file, ran it and the code ran fine. Google searches say that python arrays are only limited by the ram of the machine.
My question is how can I get a sense of the limitations of my machine? The machine I'm running the code on has several 64bit processors and 8 gb's of ram. Is there an exact way (say a built in command?) that will allow me to test if a data file will be too large without requiring that I actually run the file and wait for it to error. Secondly, what would you recommend I do to obviate such a problem in the future? Lastly, is there a smart way to append the code so that if it fails, it will let me know exactly at which line it failed so I get a sense of how far it got before crashing?
Thanks
•
•
Join Date: Jun 2008
Posts: 128
Reputation:
Solved Threads: 31
I am running file processing with larger (>1G) files on a fare more weaker machine. I hardly believe your case ran into a limitation. In my experience the never ending program is more possible, than the one running out of memory.
Try catching the exception that occurs, and print out the line number. The other way is to write the out a "rejected records" file.
Based on my experience the problem will be a malformed record having less then six columns, maybe the last one or the header.
If possible try to write the program line oriented, ie do not read in the whole file.
If everything else fails, and you firmly believe you reached some hard barrier, import it to an sqlite3 database. That can reach terrabytes...
Try catching the exception that occurs, and print out the line number. The other way is to write the out a "rejected records" file.
Based on my experience the problem will be a malformed record having less then six columns, maybe the last one or the header.
If possible try to write the program line oriented, ie do not read in the whole file.
Python Syntax (Toggle Plain Text)
for line in open(fname): process line, write other files, do aggregation whatever
If everything else fails, and you firmly believe you reached some hard barrier, import it to an sqlite3 database. That can reach terrabytes...
•
•
Join Date: Jun 2008
Posts: 128
Reputation:
Solved Threads: 31
•
•
•
•
s there a smart way to append the code so that if it fails, it will let me know exactly at which line it failed so I get a sense of how far it got before crashing?
Python Syntax (Toggle Plain Text)
for count,line in enumerate(fileobject): try: do your stuff catch: print("Some error occured on line %s" % count) print("The bad line was:") print(line) raise
That will print out the bad line, the line number, and the stack. You will know exactly on which line the exception occured, what exception was that, on which line number this exception occured.
•
•
Join Date: Jun 2008
Posts: 128
Reputation:
Solved Threads: 31
•
•
•
•
What I don't understand is if the code was inherently flawed, why did it run a 1,000,000 line file perfectly fine.
number;number;characters;number
Then, if the 10**6+1 th line contains a data like:
1;2;asd;jkle;3
Then your program will most likely crash.
![]() |
Similar Threads
- maximum of an array using threads ?? (a shout for help) (C)
- Pascal code, displaying the maximum value in an array! (Pascal and Delphi)
- user given array size (C++)
- string array size (C++)
- works for static need help to make it dynamic (C++)
- Urgent Help Needed!!!! Plzzz (Java)
- Java practice questions (Java)
Other Threads in the Python Forum
- Previous Thread: Python record functions.
- Next Thread: wxPython help
Views: 358 | Replies: 5
| Thread Tools | Search this Thread |
Tag cloud for Python
advanced anydbm app bash beginner bits calling chmod cmd code copy data dictionary directory dynamic edit examples excel external feet file float format ftp function gui homework http images import info input ip itunes java keycontrol line linux list lists loan loop maintain millimeter mouse newb number numbers output panel parsing path port prime print program programming projects push py-mailer py2exe pygame pyqt python queue random recursion recursive scrolledtext smtp split ssh string strings sudokusolver table terminal text thread threading time tkinter tlapse tricks tuple tutorial ubuntu unicode update urllib urllib2 variable variables ventrilo web webservice whileloop windows wxpython xlib





