Opening 100+ files at once in python

Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved

Join Date: May 2009
Posts: 70
Reputation: shoemoodoshaloo is an unknown quantity at this point 
Solved Threads: 2
shoemoodoshaloo shoemoodoshaloo is offline Offline
Junior Poster in Training

Opening 100+ files at once in python

 
0
  #1
Jun 24th, 2009
Hey guys,

A while ago, with your help, I was able to create a code which scans a data file for names, and when a new name is found, it appends it to a list. For each entry is the list, the code then opens a new file for each entry. Using the same code, I am trying to parse huge data files, with over 100 names each. Python doesn't like to open such a large number of files and is complaining:

  1. IOError: [Errno 24] Too many open files: '/home/hugadams/Desktop/SHANE_STUFF/Modules/Rep_Split_FULL/MER65B-int'

I was wondering if any of you knew a way to bypass this, whether it be an import or otherwise? I'd prefer to not completely restructure my code if there is a simple import that can allow python to open more files.
Reply With Quote Quick reply to this message  
Join Date: Jul 2007
Posts: 489
Reputation: shadwickman will become famous soon enough shadwickman will become famous soon enough 
Solved Threads: 76
shadwickman's Avatar
shadwickman shadwickman is offline Offline
Posting Pro in Training

Re: Opening 100+ files at once in python

 
0
  #2
Jun 24th, 2009
Can you post some of your code? I'm confused as to why you need to hold so many files open at one time... could you just cycle them one by one and not have them all open?
Last edited by shadwickman; Jun 24th, 2009 at 4:41 pm.
"Two good old boys in a fire-apple red convertible. Stoned. Ripped. Twisted. Good people."
- Hunter S. Thompson

my photography
Reply With Quote Quick reply to this message  
Join Date: Dec 2005
Posts: 5,850
Reputation: Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute 
Solved Threads: 749
Team Colleague
Salem's Avatar
Salem Salem is offline Offline
Void main'ers are DOOMed

Re: Opening 100+ files at once in python

 
0
  #3
Jun 24th, 2009
The maximum number of open files a process is allowed to have comes from the Operating System.

It's usually possible to change this (but details vary), but doing so almost certainly requires "admin/root" privileges.

It's also something which shouldn't be done lightly to save a bit of inconvenience on your part.

Just consider it in your future designs that you don't have an unlimited number of open files to play with.
Reply With Quote Quick reply to this message  
Join Date: Oct 2007
Posts: 793
Reputation: siddhant3s has much to be proud of siddhant3s has much to be proud of siddhant3s has much to be proud of siddhant3s has much to be proud of siddhant3s has much to be proud of siddhant3s has much to be proud of siddhant3s has much to be proud of siddhant3s has much to be proud of siddhant3s has much to be proud of siddhant3s has much to be proud of 
Solved Threads: 135
siddhant3s's Avatar
siddhant3s siddhant3s is offline Offline
Master Poster

Re: Opening 100+ files at once in python

 
0
  #4
Jun 25th, 2009
My question : Do we really need this? Why?
If you are needing to open 100+ files at once, you should perhaps re factor your code and design.

BTW, if you are on *inx (Unix,Linux), you can run the following command to know what is the maximum number of open file descriptors:
Note that $ is my shell prompt.
  1. $ cat /proc/sys/fs/file-max
  2. 95141
Hence, there can be at most 95141 possible file descriptors opened at once.
To change this use: where 104854 is max number which you want.
  1. $ echo "104854" > /proc/sys/fs/file-max

To check how many of file descriptor are been used;
  1. $ cat /proc/sys/fs/file-nr
  2. 5024 0 95141
The first number is total allocated file descriptors (the number of file descriptors allocated since boot)
Siddhant Sanyam
(Not posting much)
Migrate to Standard C++ :When to tell your C++ Code is Non-Standard.
Please Read before posting: How To Ask Questions The Smart Way
Reply With Quote Quick reply to this message  
Join Date: Jul 2008
Posts: 1,054
Reputation: jlm699 is a jewel in the rough jlm699 is a jewel in the rough jlm699 is a jewel in the rough jlm699 is a jewel in the rough 
Solved Threads: 265
Sponsor
jlm699's Avatar
jlm699 jlm699 is offline Offline
Knows where his Towel is

Re: Opening 100+ files at once in python

 
0
  #5
Jun 25th, 2009
I agree with everyone else here. There should never ever be a reason to open this many files at once. Ever.

You should explain what you're actually trying to do so that we can help you come up with an intelligent solution.
1. Use Code Tags.
2. Homework? Show Effort.
3. Keep discussions on the forum: no PMs
Reply With Quote Quick reply to this message  
Join Date: May 2009
Posts: 70
Reputation: shoemoodoshaloo is an unknown quantity at this point 
Solved Threads: 2
shoemoodoshaloo shoemoodoshaloo is offline Offline
Junior Poster in Training

Re: Opening 100+ files at once in python

 
0
  #6
Jun 25th, 2009
Ok, here is what I am doing:

First my code scans a large data file. In column 10, the genetic data is listed by "name", and there are about 110 different names total. Whenever it gets to a new name, it stores it in a list, so I have something like:

  1. Name_List = ['AluSp', 'AluGp', 'AluSx' ... 'ZZcta' ]

For each name in the name list, I want to store a file of the same name, eg:

  1. AluSp.bed, AluGp.bed, AluSx.bed...ZZcta.bed

The program will rescan the data file, and for each name, that line gets written into the appropriate file.

Because the raw data file is so large, I can't just open up one file for a name in the list, scan the whole data file, append entries for only the first name (AluSp.bed) and then close. Doing so would require a 110 scans of a 50 million line file. What I am doing is scanning the raw data file once, while all the 110 name files remain open, and for each line in the data file, that line gets written to the appropriate name file.

Is that clear?

The code is in place, so if you'd prefer to look at it, I can post it.
Reply With Quote Quick reply to this message  
Join Date: Oct 2006
Posts: 2,279
Reputation: sneekula has a spectacular aura about sneekula has a spectacular aura about 
Solved Threads: 176
sneekula's Avatar
sneekula sneekula is offline Offline
Nearly a Posting Maven

Re: Opening 100+ files at once in python

 
0
  #7
Jun 25th, 2009
Originally Posted by Salem View Post
The maximum number of open files a process is allowed to have comes from the Operating System.
...
That is not all of it. In Python you have constructs like:
  1. for line in file(filename):
  2. # do something with line
Here the closing of the file is left to the very eagle-eyed Python garbage collector. It is the garbage collector and its file closing algorithm that will feel overwhelmed after too many file opens.
No one died when Clinton lied.
Reply With Quote Quick reply to this message  
Join Date: May 2009
Posts: 25
Reputation: The_Kernel is an unknown quantity at this point 
Solved Threads: 8
The_Kernel The_Kernel is offline Offline
Light Poster

Re: Opening 100+ files at once in python

 
0
  #8
Jun 25th, 2009
Originally Posted by shoemoodoshaloo View Post
Ok, here is what I am doing:

First my code scans a large data file. In column 10, the genetic data is listed by "name", and there are about 110 different names total. Whenever it gets to a new name, it stores it in a list, so I have something like:

  1. Name_List = ['AluSp', 'AluGp', 'AluSx' ... 'ZZcta' ]

For each name in the name list, I want to store a file of the same name, eg:

  1. AluSp.bed, AluGp.bed, AluSx.bed...ZZcta.bed

The program will rescan the data file, and for each name, that line gets written into the appropriate file.

Because the raw data file is so large, I can't just open up one file for a name in the list, scan the whole data file, append entries for only the first name (AluSp.bed) and then close. Doing so would require a 110 scans of a 50 million line file. What I am doing is scanning the raw data file once, while all the 110 name files remain open, and for each line in the data file, that line gets written to the appropriate name file.

Is that clear?

The code is in place, so if you'd prefer to look at it, I can post it.
There's no need to keep all the files open at the same time though. Instead of having a list of 100 file handles replace it with a list of the filenames, then each time you want to write to one of them just open the file, write to it, then close it.
Reply With Quote Quick reply to this message  
Join Date: May 2009
Posts: 70
Reputation: shoemoodoshaloo is an unknown quantity at this point 
Solved Threads: 2
shoemoodoshaloo shoemoodoshaloo is offline Offline
Junior Poster in Training

Re: Opening 100+ files at once in python

 
0
  #9
Jun 25th, 2009
Originally Posted by The_Kernel View Post
There's no need to keep all the files open at the same time though. Instead of having a list of 100 file handles replace it with a list of the filenames, then each time you want to write to one of them just open the file, write to it, then close it.
Won't that the process of opening and closing the file cause it to get overwritten, instead of the line being appended?
Reply With Quote Quick reply to this message  
Join Date: May 2009
Posts: 25
Reputation: The_Kernel is an unknown quantity at this point 
Solved Threads: 8
The_Kernel The_Kernel is offline Offline
Light Poster

Re: Opening 100+ files at once in python

 
0
  #10
Jun 25th, 2009
Originally Posted by shoemoodoshaloo View Post
Won't that the process of opening and closing the file cause it to get overwritten, instead of the line being appended?
Not if you open it in append mode. That means that anything previously in the file will stay there, and this might not be desirable on the first write (since you want the file to contain only data from the current run presumably). To fix this just keep track of whether you've written to the file before; if you have open it in append mode, otherwise open it in write mode.
Reply With Quote Quick reply to this message  
Reply

This thread has been marked solved.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC