Hi all. My new job involves writing scripts for people in other departments. I'm pretty much on my own with this and I'm still a beginner with Python(I think my brain is still in PHP mode and I'm still struggling with this object oriented approach).

Here is what I have:

Input file, call it bob.txt for arrangement's sake:

<tag>
....stuff1
....more stuff1
</tag>
<tag>
....stuff2
....more stuff2
</tag>
<tag>
....stuff3
....more stuff3
</tag>

My code:

#!/usr/local/bin/python2.6   
import sys

if (len(sys.argv) < 4):
    print "Usage: splitfilebytag option1 option2 option3 option4"
    print "Run this application from the input file directory"
    print "Option 1: input filename"
    print "Option 2: output filename"
    print "Option 3: output file extension"
    print "Option 4: tag that indicates split. eg: \"</tag>\". Use inverted commas"
    print "Option 5(optional): Start file number. eg: 172"
    print "Example usage: splitfilebytag test.xml out txt \"</tag>\" 12"    
    exit()
    
readfile_= sys.argv[1]
outputfilename_ = sys.argv[2]
extension_ = sys.argv[3]
tag_ = sys.argv[4]

try:
    if sys.argv[5]:
        num_ = int(sys.argv[5])
except:
    num_ = 0
    
def split_(readfile_, outputfilename_, extension_, tag_, num_):
    thelist_=[]
    
    with open(readfile_, 'r') as thefile_:
        for line_ in thefile_:
            if tag_ in line_:
                thelist_.append(line_)
                outfilename_ = '%s%03d.%s' % (outputfilename_, num_, extension_)
                num_ += 1
                outfile_ = open(outfilename_, 'w')
                for item_ in thelist_:
                    outfile_.write(item_)
                thelist_=[]
                outfile_.close()
            else:
                thelist_.append(line_)
            
if __name__ == "__main__":		
		
    split_(readfile_, outputfilename_, extension_, tag_, num_)

So in this case if my "users" run this ./plitfilebytag.py bob.txt out txt \"</tag>\" 12

They will en up with 3 separate files that look like this:

out012.txt

<tag>
....stuff1
....more stuff1
</tag>

out013.txt

<tag>
....stuff2
....more stuff2
</tag>

out014.txt

<tag>
....stuff3
....more stuff3
</tag>

If there is anyone that could suggest a better approach or improvements to this I would appreciate it. I think this looks ok(to me at least) and the only thing I might change is to have the amount of leading zeroes as another option.

Anybody?

An improvement would be to use the argparse module to handle the command line. It would be especially useful to learn if you need to write a lot of scripts.

Ah, argparse. That is exactly the kind of thing I have to look at thanks! I'll update my script and post it back here.
Thanks!

I can't use argparse as I'm not using Python 2.7 where this was introduced. I can use optparse though:

import optparse
parser = optparse.OptionParser()

parser.add_option('-i', help='Input filename', dest='readfile_', action='store', type='string')
(opts, args) = parser.parse_args()
print opts.readfile_

I'll fiddle some more tomorrow. but this looks like a good, "proper" way to pass and parse arguments. Thanks again

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.