I have a list of part files (that is many parts of single file).
I want to append all parts into a single file to make a full file.

I used this code:

import os
l = ["home/baskar/1.part","home/baskar/2.part","home/baskar/3.part"]

source_file ="/home/baskar/source.file"

re_size = 0
buffer = 10*1024   #10 KiB's
for x in range(len(l)):
    target_size = os.path.getsize(l[x])
    part = open(l[x],"r")
    while 1:
        output = part.read(buffer)
        re_size = re_size + len(output)
        F = open(source_file,"ab")
        F.write(output)
        F.close()
        if re_size == target_size:
            break
        else:print "Appending file",l[x]

This code works perfectly ,But it takes much long time for appending file, even small file like 5mb file takes 3 min to complete.

Is there is any way to append file or is there is any wrong with my code ?

if anyone know please tell me.

Thanks in advance

Edited 7 Years Ago by baskar007: n/a

I'm just guessing here, but I wonder if your code is less efficient because you open and close the output file repeatedly throughout the process. You perform an open and close operation every 10K, so for a 5MB file you would be opening and closing the file 512 times. Maybe you should open the file before you start your while loop and close it when the while loop is done. I can't believe that it would give you a huge performance bump, but maybe there is more to opening and closing a file than I realize.

Not sure if this runs any faster then yours but it does run quickly for files a little over 50 mb. At least on my Debian box.

import os

l = ["/home/tyrant/Desktop/1.part", "/home/tyrant/Desktop/2.part", "/home/tyrant/Desktop/3.part"]
source_file ="/home/tyrant/Desktop/source.file"

buffer = 10*1024   #10 KiB's

F = open(source_file,"ab")

for x in range(len(l)):
    re_size = 0
    target_size = os.path.getsize(l[x])
    part = open(l[x],"r")

    while True:
        output = part.read(buffer)
        re_size += len(output)

        F.write(output)

        if re_size == target_size:
            break
        else:
            print "Appending file",l[x]

F.close()

Ok I just ran a time test using the datetime module. I ran the tests 5 times for each script and found the average run time. For your script unmodified I get .982, and for the modified script I get .721.

Now that I've had a chance to try out your code, I get a better idea of what is going wrong. I created three files full of random garbage each 5MB in size and then ran your code against them.

What I saw was that the first two files were appended properly, and then there was a very long pause it seemed that the code was stuck in an endless loop. So I added a line at line 16 that would show me the value of re_size and target_size and print that. What I saw was that your code found a place where re_size and target_size were the same, but just kept on going.

Stepping through your code, the script gathers up the file size of the first file. Then it starts reading an appending data until the file size of your output matches the file size of your input file. Then it moves on to the next file. However, you never reset the value of re_size to zero so after it starts on the 2nd file, the value of re_size and target_size will never be equal again.

So I took out the extra line that I put in at line 16, and I added this at line 14:
re_size=0. I also put in some time stuff so I could see how long the code was taking. It was much improved. Here is the final code

import os
import time

print time.strftime('%H:%M:%S', time.localtime())
l = ["1.part","2.part","3.part"]

source_file ="finished.file"

re_size = 0
buffer = 10*1024   #10 KiB's
for x in range(len(l)):
    target_size = os.path.getsize(l[x])
    part = open(l[x],"r")
    re_size = 0
    while 1:
        output = part.read(buffer)
        re_size = re_size + len(output)
        F = open(source_file,"ab")
        F.write(output)
        F.close()
        if re_size == target_size:
            break

print time.strftime('%H:%M:%S', time.localtime())

Here is the output

kevins-macbook:append_test kevin$ python append1.py 
14:33:35
14:33:36

By the way, there is a slight performance boost by opening the file once and closing it once as I suggested in my first reply. I ran the code again and replaced the strftime calls with a simple print time.time(). What I found was that the first version of the code (which corrects the bug but still opens and closes the file 500 times) runs in 1.09 seconds on my machine.

Then I modified the program so that the file is opened only once and closed only once. The resulting program runs in .7 seconds. So by following my first advice you could have saved yourself .309 seconds! HUGE!!!!

By the way, there is a slight performance boost by opening the file once and closing it once as I suggested in my first reply. I ran the code again and replaced the strftime calls with a simple print time.time(). What I found was that the first version of the code (which corrects the bug but still opens and closes the file 500 times) runs in 1.09 seconds on my machine.

Then I modified the program so that the file is opened only once and closed only once. The resulting program runs in .7 seconds. So by following my first advice you could have saved yourself .309 seconds! HUGE!!!!

Thanks, mn_kthompson

Then I modified the program so that the file is opened only once and closed only once. The resulting program runs in .7 seconds. So by following my first advice you could have saved yourself .309 seconds! HUGE!!!!

i have one more questions:
is there is any to moving files into one single file ? instead of copying ?

Hmmm, not really. I guess you could open the first file in append mode and then append the 2nd and 3rd file. Then you could delete files 2 and 3.

This article has been dead for over six months. Start a new discussion instead.