I put this in the Unix catagory, however I beleive Windows pipes also can perform atomic writes - correct me if I'm wrong!

Say I'm trying to save a large stream of information onto a filesystem. This filesystem might be local, it might be LTFS (a tape filesystem. Here, seeks are expensive), and it might be on a network (thus, extra information from and too the fs should be avoided). Futhermore, there is a (lets assume good) chance of the connection breaking.

I want to be able to resume the operation fairly quickly (preferable little seek as well) in the case of a break.

One advantage I do have is I'm manually reading one stream into the other (in perl).

What I'm currently doing is reading PIPE_BUF bytes (to guarrentee atomic operations if I understand them correctly) of information from my stream, and writting it either into cURL (for remote filesystems), or writting it directly into a file.

Question 1: Would writting PIPE_BUF bytes to the file, and waiting it successfully continue mean that that chunck was written as far as software is concerned?

Question 2: Assuming that I'm working locally, if it fails, can I wait for the user to fix the problem, re-open the stream (append) and continue writting safetly?

Question 3: Anyone happen to know how well cURL's resume feature works?

Question 4: Would it just be better to stream blocks of the data to RAM, and write a new file for each block. When a curruption occurs, re-write the data. Assuming that a stream is 4TiB, and breaking it into 100MiB files would give us 42107 files (Which I dout would be a problem for any modern filesystem). I would prefer the other method (less hard on the memory, less resending, and less files to manage), but reliability is also something I'm hoping for.

Anyone else see another way to approach this?

I have another idea. In the case of a local file, if a stream breaks, wait for the user to fix it and then use dd to overwrite the last block (I'm hopping that dd doesn't need to seek through the file in the case of a tape drive), and start appending to the stream from there.

In the case of doing it over a network, we would do the same thing (which means we would be using ssh instead of cURL).

At least this way it isn't split up into 40,000 files, and it's easier on the memory as well as easier on the traffic.

Any thoughts for a cleaner solution?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.