Here is a function to split the file
from itertools import islice, count
def split_file(filename, subfile_prefix, max_lines):
assert max_lines > 0
try:
with open(filename, "rb") as src_file:
for i in count(1): # 1, 2, 3, ... indefinitely
line = next(src_file) # read one line to raise StopIteration at the end of src_file
dst_filename = "{0}_{1:d}.txt".format(subfile_prefix, i)
with open(dst_filename, "wb") as dst_file:
dst_file.write(line)
dst_file.writelines(islice(src_file, 0, max_lines-1))
except StopIteration:
pass
You only need to add command line arguments parsing to turn this into a script :)
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
Here is a better version which uses an abstract helper function to group items evenly in any iterable
from itertools import islice, chain
def grouper(n, iterable):
"""split an iterable into a sequence of iterables with at most n items"""
assert n > 0
it = iter(iterable)
return (chain((item,), islice(it, 0, n-1)) for item in it)
def split_file(filename, subfile_prefix, max_lines):
with open(filename, "rb") as src_file:
for i, group in enumerate(grouper(max_lines, src_file), 1):
dst_filename = "{0}_{1:d}.txt".format(subfile_prefix, i)
with open(dst_filename, "wb") as dst_file:
dst_file.writelines(group)
Notice that there is a similar grouper() function in the itertools module's documentation for python 3 (the function is not part of the module). This one is different (I don't like the implementation described in the doc). Also note that the groups items must be "consumed" in order for the algorithm to work.
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
lordspace
Junior Poster in Training
90 posts since May 2006
Reputation Points: 18
Solved Threads: 6
http://usage.cc/split
Nice, I didn't think to look in linux commands. Splitting a file is a basic task. My python code is multiplatform however.
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691