| | |
Opening 100+ files at once in python
Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved |
•
•
Join Date: May 2009
Posts: 70
Reputation:
Solved Threads: 2
Hey guys,
A while ago, with your help, I was able to create a code which scans a data file for names, and when a new name is found, it appends it to a list. For each entry is the list, the code then opens a new file for each entry. Using the same code, I am trying to parse huge data files, with over 100 names each. Python doesn't like to open such a large number of files and is complaining:
I was wondering if any of you knew a way to bypass this, whether it be an import or otherwise? I'd prefer to not completely restructure my code if there is a simple import that can allow python to open more files.
A while ago, with your help, I was able to create a code which scans a data file for names, and when a new name is found, it appends it to a list. For each entry is the list, the code then opens a new file for each entry. Using the same code, I am trying to parse huge data files, with over 100 names each. Python doesn't like to open such a large number of files and is complaining:
Python Syntax (Toggle Plain Text)
IOError: [Errno 24] Too many open files: '/home/hugadams/Desktop/SHANE_STUFF/Modules/Rep_Split_FULL/MER65B-int'
I was wondering if any of you knew a way to bypass this, whether it be an import or otherwise? I'd prefer to not completely restructure my code if there is a simple import that can allow python to open more files.
Can you post some of your code? I'm confused as to why you need to hold so many files open at one time... could you just cycle them one by one and not have them all open?
Last edited by shadwickman; Jun 24th, 2009 at 4:41 pm.
"Two good old boys in a fire-apple red convertible. Stoned. Ripped. Twisted. Good people."
- Hunter S. Thompson
my photography
- Hunter S. Thompson
my photography
The maximum number of open files a process is allowed to have comes from the Operating System.
It's usually possible to change this (but details vary), but doing so almost certainly requires "admin/root" privileges.
It's also something which shouldn't be done lightly to save a bit of inconvenience on your part.
Just consider it in your future designs that you don't have an unlimited number of open files to play with.
It's usually possible to change this (but details vary), but doing so almost certainly requires "admin/root" privileges.
It's also something which shouldn't be done lightly to save a bit of inconvenience on your part.
Just consider it in your future designs that you don't have an unlimited number of open files to play with.
My question : Do we really need this? Why?
If you are needing to open 100+ files at once, you should perhaps re factor your code and design.
BTW, if you are on *inx (Unix,Linux), you can run the following command to know what is the maximum number of open file descriptors:
Note that $ is my shell prompt.
Hence, there can be at most 95141 possible file descriptors opened at once.
To change this use: where 104854 is max number which you want.
To check how many of file descriptor are been used;
The first number is total allocated file descriptors (the number of file descriptors allocated since boot)
If you are needing to open 100+ files at once, you should perhaps re factor your code and design.
BTW, if you are on *inx (Unix,Linux), you can run the following command to know what is the maximum number of open file descriptors:
Note that $ is my shell prompt.
Python Syntax (Toggle Plain Text)
$ cat /proc/sys/fs/file-max 95141
To change this use: where 104854 is max number which you want.
Python Syntax (Toggle Plain Text)
$ echo "104854" > /proc/sys/fs/file-max
To check how many of file descriptor are been used;
Python Syntax (Toggle Plain Text)
$ cat /proc/sys/fs/file-nr 5024 0 95141
Siddhant Sanyam
(Not posting much)
Migrate to Standard C++ :When to tell your C++ Code is Non-Standard.
Please Read before posting: How To Ask Questions The Smart Way
(Not posting much)
Migrate to Standard C++ :When to tell your C++ Code is Non-Standard.
Please Read before posting: How To Ask Questions The Smart Way
I agree with everyone else here. There should never ever be a reason to open this many files at once. Ever.
You should explain what you're actually trying to do so that we can help you come up with an intelligent solution.
You should explain what you're actually trying to do so that we can help you come up with an intelligent solution.
•
•
Join Date: May 2009
Posts: 70
Reputation:
Solved Threads: 2
Ok, here is what I am doing:
First my code scans a large data file. In column 10, the genetic data is listed by "name", and there are about 110 different names total. Whenever it gets to a new name, it stores it in a list, so I have something like:
For each name in the name list, I want to store a file of the same name, eg:
The program will rescan the data file, and for each name, that line gets written into the appropriate file.
Because the raw data file is so large, I can't just open up one file for a name in the list, scan the whole data file, append entries for only the first name (AluSp.bed) and then close. Doing so would require a 110 scans of a 50 million line file. What I am doing is scanning the raw data file once, while all the 110 name files remain open, and for each line in the data file, that line gets written to the appropriate name file.
Is that clear?
The code is in place, so if you'd prefer to look at it, I can post it.
First my code scans a large data file. In column 10, the genetic data is listed by "name", and there are about 110 different names total. Whenever it gets to a new name, it stores it in a list, so I have something like:
Python Syntax (Toggle Plain Text)
Name_List = ['AluSp', 'AluGp', 'AluSx' ... 'ZZcta' ]
For each name in the name list, I want to store a file of the same name, eg:
Python Syntax (Toggle Plain Text)
AluSp.bed, AluGp.bed, AluSx.bed...ZZcta.bed
The program will rescan the data file, and for each name, that line gets written into the appropriate file.
Because the raw data file is so large, I can't just open up one file for a name in the list, scan the whole data file, append entries for only the first name (AluSp.bed) and then close. Doing so would require a 110 scans of a 50 million line file. What I am doing is scanning the raw data file once, while all the 110 name files remain open, and for each line in the data file, that line gets written to the appropriate name file.
Is that clear?
The code is in place, so if you'd prefer to look at it, I can post it.
•
•
•
•
The maximum number of open files a process is allowed to have comes from the Operating System.
...
python Syntax (Toggle Plain Text)
for line in file(filename): # do something with line
No one died when Clinton lied.
•
•
Join Date: May 2009
Posts: 25
Reputation:
Solved Threads: 8
•
•
•
•
Ok, here is what I am doing:
First my code scans a large data file. In column 10, the genetic data is listed by "name", and there are about 110 different names total. Whenever it gets to a new name, it stores it in a list, so I have something like:
Python Syntax (Toggle Plain Text)
Name_List = ['AluSp', 'AluGp', 'AluSx' ... 'ZZcta' ]
For each name in the name list, I want to store a file of the same name, eg:
Python Syntax (Toggle Plain Text)
AluSp.bed, AluGp.bed, AluSx.bed...ZZcta.bed
The program will rescan the data file, and for each name, that line gets written into the appropriate file.
Because the raw data file is so large, I can't just open up one file for a name in the list, scan the whole data file, append entries for only the first name (AluSp.bed) and then close. Doing so would require a 110 scans of a 50 million line file. What I am doing is scanning the raw data file once, while all the 110 name files remain open, and for each line in the data file, that line gets written to the appropriate name file.
Is that clear?
The code is in place, so if you'd prefer to look at it, I can post it.
•
•
Join Date: May 2009
Posts: 70
Reputation:
Solved Threads: 2
Won't that the process of opening and closing the file cause it to get overwritten, instead of the line being appended?
•
•
Join Date: May 2009
Posts: 25
Reputation:
Solved Threads: 8
Not if you open it in append mode. That means that anything previously in the file will stay there, and this might not be desirable on the first write (since you want the file to contain only data from the current run presumably). To fix this just keep track of whether you've written to the file before; if you have open it in append mode, otherwise open it in write mode.
![]() |
Similar Threads
- Matlab To Python (Python)
- What are the .apy files in Python (Python)
- C:\Program Files\Common keeps opening on boot (Windows NT / 2000 / XP)
- Export multiple csv files with Python (Python)
- How to set default directory for saving files in Python IDLE (Python)
- Searching for Files through a Directory -- PLEASE HELP!!! (Python)
- freevo uses python (Python)
- Several problem with video files (Windows Software)
- Working with array of files (C++)
Other Threads in the Python Forum
- Previous Thread: matplotlib legend goes opaque when saved as eps
- Next Thread: WX python button trouble
| Thread Tools | Search this Thread |
accessdenied apache application argv array beginner book builtin change chmod converter countpasswordentry curved dan08 dictionary dynamic edit enter examples file filename float format function gui homework import inches input java keyboard lapse library line lines linux list lists loop microphone mouse movingimageswithpygame mysql mysqlquery newb number numbers numeric output parameters parsing path phonebook plugin port prime programming projects py2exe pygame pyopengl pysimplewizard python random recursion redirect remote reverse scrolledtext session simple smtp software sprite statictext string strings syntax table tennis terminal text textarea thread threading time tkinter tlapse trick tuple tutorial ubuntu unicode unit urllib urllib2 variable windows wordgame wxpython






