Hi all. My new job involves writing scripts for people in other departments. I'm pretty much on my own with this and I'm still a beginner with Python(I think my brain is still in PHP mode and I'm still struggling with this object oriented approach).

Here is what I have:

Input file, call it bob.txt for arrangement's sake:

<tag>
....stuff1
....more stuff1
</tag>
<tag>
....stuff2
....more stuff2
</tag>
<tag>
....stuff3
....more stuff3
</tag>

My code:

#!/usr/local/bin/python2.6   
import sys

if (len(sys.argv) < 4):
    print "Usage: splitfilebytag option1 option2 option3 option4"
    print "Run this application from the input file directory"
    print "Option 1: input filename"
    print "Option 2: output filename"
    print "Option 3: output file extension"
    print "Option 4: tag that indicates split. eg: \"</tag>\". Use inverted commas"
    print "Option 5(optional): Start file number. eg: 172"
    print "Example usage: splitfilebytag test.xml out txt \"</tag>\" 12"    
    exit()
    
readfile_= sys.argv[1]
outputfilename_ = sys.argv[2]
extension_ = sys.argv[3]
tag_ = sys.argv[4]

try:
    if sys.argv[5]:
        num_ = int(sys.argv[5])
except:
    num_ = 0
    
def split_(readfile_, outputfilename_, extension_, tag_, num_):
    thelist_=[]
    
    with open(readfile_, 'r') as thefile_:
        for line_ in thefile_:
            if tag_ in line_:
                thelist_.append(line_)
                outfilename_ = '%s%03d.%s' % (outputfilename_, num_, extension_)
                num_ += 1
                outfile_ = open(outfilename_, 'w')
                for item_ in thelist_:
                    outfile_.write(item_)
                thelist_=[]
                outfile_.close()
            else:
                thelist_.append(line_)
            
if __name__ == "__main__":		
		
    split_(readfile_, outputfilename_, extension_, tag_, num_)

So in this case if my "users" run this ./plitfilebytag.py bob.txt out txt \"</tag>\" 12

They will en up with 3 separate files that look like this:

out012.txt

<tag>
....stuff1
....more stuff1
</tag>

out013.txt

<tag>
....stuff2
....more stuff2
</tag>

out014.txt

<tag>
....stuff3
....more stuff3
</tag>

If there is anyone that could suggest a better approach or improvements to this I would appreciate it. I think this looks ok(to me at least) and the only thing I might change is to have the amount of leading zeroes as another option.

Anybody?

Recommended Answers

All 4 Replies

An improvement would be to use the argparse module to handle the command line. It would be especially useful to learn if you need to write a lot of scripts.

Ah, argparse. That is exactly the kind of thing I have to look at thanks! I'll update my script and post it back here.
Thanks!

I can't use argparse as I'm not using Python 2.7 where this was introduced. I can use optparse though:

import optparse
parser = optparse.OptionParser()

parser.add_option('-i', help='Input filename', dest='readfile_', action='store', type='string')
(opts, args) = parser.parse_args()
print opts.readfile_

I'll fiddle some more tomorrow. but this looks like a good, "proper" way to pass and parse arguments. Thanks again

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.