Hello everyone,

I am working on one program that generates a lot of data...eventually it can be in GB range. And the way I am doing it now is just by writing to ascii file everything time... like this:

ofstream flowC("outputfile.out"); // declared at the start
flowC << prevsid << ", " << threadid << endl; // inside working block - repeats many many times

However, this is not very efficient. First because I am writing ascii and because I am writing to dick every single time.

To improve this.. I want to write 2nd line above to a buffer of size 2048 or bigger. And then when buffer gets full I would write it to a file and emptry it. So I would write file in chunks of 2048 or so.... I also want to avoid writing file in ascii, but instead generate a binary file.

Can someone help me how to make this work correctly? I found some examples of how to write to a buffer but I am not sure how to write it from there to binary file.

char buffer[2048];
flowC.rdbuff()->pubsetbuf(buffer, 2048);
.......
flowC << prevsid << ", " << threadid << endl;

I also eventually would like to compress the buffer before I write it to a file. I am not sure how this can be done yet in C or C++.

Edited 3 Years Ago by kdar

First of all, the iostream library is already buffered under-the-hood. So, there isn't much reason for you to add another buffer on top of that (although it might help). The reason for your slow performance with this line:

flowC << prevsid << ", " << threadid << endl;

is in part due to ascii output (or formatted output), but mainly due to the presence of endl at the end of the output. The endl thing has two effects: adds a new-line to the output; and flushes the stream. The latter effect means that everything that is currently pending on the stream's internal buffer will be flushed, i.e., physically written to the file. So, the main problem here is that you are flushing the buffer every time in that loop. Simply replacing the endl with a new-line character will make a big difference because the stream will essentially do exactly the kind of buffering you were trying to do yourself (write stuff to a temporary buffer until it is too full and then flush it to the file). So, try this instead:

flowC << prevsid << ", " << threadid << "\n";

If you actually want to do some buffering on your own, on top of the buffering that is already done by the file-stream object, then I would recommend that you use a std::stringstream instead. As follows:

std::stringstream ss_buffer;

.......

  // in the loop:
  ss_buffer << prevsid << ", " << threadid << "\n";
  if( ss_buffer.tellp() > 1024 )  // or some other threshold on the size of buffer
    flowC << ss_buffer.rdbuf();   // dump the buffer into the file-stream.

....

// at the end of loop:
flowC << ss_buffer.rdbuf();  // dump whatever is left on the buffer.
flowC << flush;              // flush the file-stream.

I cant say for sure without measuring but know that, in general, output streams are buffered. In particular, writing to disk is buffered by the OS. It is unlikely that you will notice a difference between writing a single line 1000 times or a 1000-line entry once. As far as the ASCII v. binary, perhaps it would be beneficial to you to write compressed data. You could use boost zlib, gzip or bzip2 support to make things easy.

Thanks Mike. I wasn't aware that with endl I forced it to write to a file. Do you kno whow large is buffering with iostream?

LZSqr. I will try look up on boost gzip. Do you know of any simple examples using it?

This article has been dead for over six months. Start a new discussion instead.