Hey everyone,

I am writing a simple code, and seek to do something which I don't know whether it is possible or not.

First, I have a name list:

Name_List = ["Jake", "Steve", "Adam"];

The first thing I'd like to do is open a separate file for each name in the name list. For example, I'd like to have three files open, Jake.bed, Steve.bed, Adam.bed Not sure how to do this, because if I say something like:

for name in Name_List:
	file=open(name, 'a')

Won't that just open one file, then overwrite it immediately, then overwrite it immediatly once more? Essentially, how do I open all three files at once, with separate names.

Secondly, I need to parse a large data file and search the 10th column for names found on the name list. Instead of searching with a simple loop, which would search the entire file for the name "jake" then the entire file for "Steve" etc... I want to be able to search for "jake" or "steve" or "Adam" in one sweep. This way, if my name list is 50 names, I only have to sweep the file once, instead of fifty times.

Is this possible as well? I have no idea how to tell the computer "for any name on the namelist, match..."

Recommended Answers

All 12 Replies

I believe I have at least solved the second part of my problem, that is, searching the file only once:

for line in data_file:
	for name in Name_List:
		if str(sline[10]) == str(name):
			print 'match found'
		else
			print 'no match found'

Unless someone sees a smarter way.

You can open and close one file for each name in the list sequentially, but not simultaneously. There is a limit as to how many file handles you have open!

It is unclear to me what the function really is. Are you merely validating the name in the list exists in the file? You can do that by extracting a word from the file and comparing it to the 'active' list of names in your string table. Or load the entire file into memory and merely scan it for each name!

You can open and close one file for each name in the list sequentially, but not simultaneously. There is a limit as to how many file handles you have open!

It is unclear to me what the function really is. Are you merely validating the name in the list exists in the file? You can do that by extracting a word from the file and comparing it to the 'active' list of names in your string table. Or load the entire file into memory and merely scan it for each name!

I do not want to just validate that the name exists; rather, when an instance of the name occurs, I want to pick out that line specifically and write it into a new data file. So if I started with a raw file with a billion lines, and 300 of those lines had the word "jake" in column ten, I want to program to create a file named "jake.bed" with 300 entries, each one being a line from the original data file.

The good news is computers have lots of memory and I've never seen a file that contains a billion lines. A couple million yes in an ASCII based data file, but never more then that! And that easily fits on a loaded 32-bit system. Memory is cheap!

Alternatively how about each name in your list has a dynamic array of file offsets! Parse the file once and collect the file positions and store them in each's offset list. When done parsing, use list to find the string in the file.

Use a fixed size like 100 or some other number to 'grow' the list when it becomes full, and maintain an insertion count. When the list becomes full, grow the list, copy the list, and continue on!

That sounds really cool, but is way beyond my current knowledge of python. Thanks for the help.

Actually, what I may be able to do is bypass this whole thing with a bash script. That should work, write? Yes, I'd prefer to do this at the code-level, but can't hurt.

Technically you are writing a tool. In real life a tool is used repeatedly in the process of development. You do not want someone waiting for a tool so although there are a multitude methods of writing an application, you should pick one that runs the quickest, even if it isn't elegant! You can always put in a comment as to why you chose one approach over another! But above all, make the code clean and thus easy to read!

The first thing I'd like to do is open a separate file for each name in the name list. For example, I'd like to have three files open, Jake.bed, Steve.bed, Adam.bed

You could maintain a list of open file handles like this:

open_files = []
for each_name in names_list:
    open_files.append(open(each_name + '.bed', 'w'))

Then that list would be full of open file handles.

You could maintain a list of open file handles like this:

open_files = []
for each_name in names_list:
    open_files.append(open(each_name + '.bed', 'w'))

Then that list would be full of open file handles.

Nice Idea. I'll try this tomorrow and see how it works.

You could maintain a list of open file handles like this:

open_files = []
for each_name in names_list:
    open_files.append(open(each_name + '.bed', 'w'))

Then that list would be full of open file handles.

A nice feature of this approach is that the index of the file handle in open_files is the same as the index of the name in the names_list.

Note to shoemoodoshaloo: please do not use tabs for indentation!


Note to shoemoodoshaloo: please do not use tabs for indentation!

What should I use?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.