Good evening all,

Currently I am working on a program that will automate the backup/restore jobs at the place I work. The rough work has been done, but I am somewhat concerned about the possibility of massive IO operations due to the way my program works.
Here is the scenario:
My colleagues and me sometimes have to do PC transfers. Copying all relevant userdata from one PC, to a networklocation, set up the new PC, copy from network to new PC.=

Our servers/NAS/network are all up to par, so they won't be a problem. But some users still gather quite a bit of data (think along the lines of 5-10GB/15.000-30.000 files).
The program creates a BackgroundWorker for all the relevant folder, then gathers all the files to copy through a List<FileInfo>. Then a foreach loop to go through that list and copy the files as below:

//500KB chosen as buffersize, this allows for very fast (700MB in ~6-7 seconds) transfers without flooding the HD buffer (so far..)
//Should also be safe for network usage.
int bufferSize = (int)(1024 * 1024 * 0.5);

FileStream strIn = new FileStream(CurLoc, FileMode.Open);
FileStream strOut = new FileStream(NewLoc, FileMode.Create);

byte[] buf = new byte[bufferSize];
while (strIn.Position < strIn.Length)
{
int len = strIn.Read(buf, 0, buf.Length);
strOut.Write(buf, 0, len);

//Should update once bufferSize has been tranferred, so should be once every 500KB
SetProBar(strIn.Position, strIn.Length, EPB);
}

//Flush and close streams, otherwise the file will not be properly written
strIn.Flush();
strIn.Close();
strOut.Flush();
strOut.Close();

But I am worried that given the very large amount of files might put IO operations through the roof, and slow the system to a crawl.

So I am kind of looking for a way to limit the copy process to roughly 5 files per BackgroundWorker.
For this I think I should use something like while(filesCopying < 5) or something. But I am not quite sure at what point I should be running this. And how do I make the rest of the foreach wait until it is done copying?

If anybody could shed some light on this, I would be very appreciative!

Good evening all.

Recommended Answers

All 14 Replies

I don't see a foreach in there, but based on your description it's only going to copy one file at a time.

The foreach code is in a different section, but looks like this:

//Just to test
foreach (FileInfo FI in FI_Desktop)
{
    while (filesCopying < 5)
    {
        filesCopying++;

        Copy(FI.FullName, FI.FullName.Replace(FI.Extension, "_-_"), EPB_Desktop);

        filesCopying--;
    }
}

But basically the idea is to check the List<FileInfo> for all the entries, and copy them to the new location, with a maximum of X at a time.

Different thought entirely, what about doing them sequentially, and then utilizing the different BackgroundWorkers to have it copy roughly 4-7 files at the same time? (roughly the amount of BackgroundWorkers that will run in async mode)

Disk I/O is single threaded, copying more than one file at a time will not speed up the copying, in fact it will slow it down as the read/write head has to seek to a new position as it swaps back and forth between files. I'd put the foreach code into a background worker, but I wouldn't do anything else with threading.

I'm not trying to speed up the copying, it's fine the way it is, and if it were slower that wouldn't be much of a problem either.
Would that be a new BackgroundWorker, or can it be the same one it is in now?
(current situation:
6 BackgroundWorkers for the specified folders, all async mode, all started at the same time.
BW job:
It collects all the files in a List<FileInfo>, gathers some basic info from those files (number of, and size), then starts the copy through a foreach.)

Question: Why are you copying bytes manually rather than using File.Copy()?

For updating the ProgressBar

Could you not base the progress bar off how many files have been copied?

Max value = total file number
progress value = amount copied?

For updating the ProgressBar

That makes sense, though an equally viable perspective is that each file is a transaction and the progress bar tells you how many transactions out of the total have completed rather than taking granularity into how progress of each transaction. You don't really care about partial files, after all, only complete files. If a file is partially complete then that's an error condition.

My concern here is that manually copying a file introduces risk of corruption that File.Copy() will be more likely to respond to correctly.

How big is the chance of such a corruption?
My logic:
If the stream is correctly being opened and it can read the file per byte up to 500KB, and is able to transfer those as well, without it causing an IOException (I will add full error handling after all the logic is complete), shouldn't I be able to safely assume the file is intact?
Also, looking at the exceptions a FileStream.Read/Write can throw versus a File.Copy, the only difference I can see mostly relates to File/Folder paths. (though those might actually be a problem in some rare cases).

The current logic is working, and gives me quite a bit of control, therefore I would not part with it immediatly. However if the risk is quite high, or actually anywhere above 0.5%, then I will change the code of course!

How big is the chance of such a corruption?

Not big, but big enough for me to question why you're doing a byte-wise copying loop over a network instead of using standard libraries.

The current logic is working, and gives me quite a bit of control, therefore I would not part with it immediatly.

Smells like "not invented here" syndrome, to be honest.

Hehehe, I guess there can be no denying that is partly true.
The main reason for creating this program was as an exercise, and this way I got to use threads, delegates, cross-thread UI controlling, the logic and coding involved in threaded application, custom made UserControls, and bytewise transfers.
If the transfer is to be done by a File.Copy() then it wouldn't make any sense to use delegates, cross-thread UI control, or even the custom controls. I can simply use the default properties a BackgroundWorker provides

That being said, since it is a learning experience, I should not disregard the lesson either. Given that the integrity of the data is the most important part of the program, I will change the code accordingly.

I will mark the thread as solved in roughly 24 hours, so if anyone still has questions, remarks, or tips still has a little time!

deceptikon, thanks a lot for the advice and insight.

Well, I would kind of like to go without dependencies outside of the direct .NET environment. The code for the inclusion looks rather messy (to me). Though I am also piecing together a sort of utilities namespace/file, and the FileCopyEx is an often referred to method..
In short my personal preferance would be to use File.Copy() and only have a filecount level of accuracy (again, many files, and I don't/shouldn't care about partial files) versus external dependencies and having increased accuracy for the ProgressBar.

Thanks for the suggestion though!

Closing the thread now.
Thumbed up the responses accordingly.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.