I have a few ( slightly related ) questions about binary i/o in C++, I can't seem to find full answers to these anywhere..

- Is there any way to tell if an i/ostream passed to a function has been opened in binary or text mode?

- If not, are there any nuances to look out for when using put/read/write methods on an i/ostream that was opened in text mode? Would a process that assumes an i/ostream was binary, and worked 'correctly' in that case, be in any way incorrect if the stream it used was instead opened in text mode?

- How does endian-ness work? If two bytes A and B are written ( in that order ) into a file using put, then read on a machine with reversed endian-ness, do the bytes come out in a different order ( i.e. B, A ), or do they come out in the order A, B, but with the actual bits themselves being backwards? Or does the ( binary mode ) i/ostream normalize this so that it's not a problem? What if the bytes are written using ostream.write directly from an array reinterpret_casted to a char * ? ( and then read using istream.read )

Recommended Answers

All 10 Replies

- Is there any way to tell if an i/ostream passed to a function has been opened in binary or text mode?

Not directly through the object by default. The easiest solution if you need that information is to pass it along with the object as a parameter. To save yourself some trouble in keeping up with multiple entities you can save an application specific mode flag in the stream object using xalloc() and iword(), or xalloc() and pword():

#include <fstream>
#include <iostream>

namespace EdRules {
  using namespace std;

  int OPEN_MODE = 0;

  void foo(ostream& os)
  {
    switch (os.iword(OPEN_MODE)) {
      case 0: cout << "Binary\n"; break;
      case 1: cout << "Text\n"; break;
    }
  }
}

int main()
{
  using namespace std;
  using namespace EdRules;

  OPEN_MODE = cout.xalloc();
  cout.iword(OPEN_MODE) = 0;

  foo(cout);

  cout.iword(OPEN_MODE) = 1;

  foo(cout);
}

- If not, are there any nuances to look out for when using put/read/write methods on an i/ostream that was opened in text mode? Would a process that assumes an i/ostream was binary, and worked 'correctly' in that case, be in any way incorrect if the stream it used was instead opened in text mode?

If the process expects untranslated new-lines and gets translated new-lines, it could fail, but if you don't inspect and work with the characters directly in a way that relies on the transformations that a binary stream does and a text string doesn't do, it shouldn't matter.

- How does endian-ness work? If two bytes A and B are written ( in that order ) into a file using put, then read on a machine with reversed endian-ness, do the bytes come out in a different order ( i.e. B, A ), or do they come out in the order A, B, but with the actual bits themselves being backwards?

It's all about the interpretation of the bytes when loaded into memory that matters, not how they're stored. If the first machine writes A,B and the second machine reads A,B, the values will be interpreted differently because to the first machine A,B means A,B, but to the second machine A,B means B,A.

Or does the ( binary mode ) i/ostream normalize this so that it's not a problem? What if the bytes are written using ostream.write directly from an array reinterpret_casted to a char * ? ( and then read using istream.read )

read() and write() don't do any magic with endianness. You get what you wrote, even if what you wrote isn't want you want. :)

Thanks for the info on storing things into the i/ostream directly, that could come in useful elsewhere :P, but, the reason I wanted to determine the open mode is to stop a method being inadvertently called with a stream in the wrong mode.. so, that won't work.. but I seem to get the same results in light testing when writing to a stream opened in 'the wrong mode' anyway.. hence my second question.

The endian-ness thing, I'm still not sure I get it... a more specific example of what I'm doing, writing 32-bit floats to and from binary files, I don't have access to a machine with a different endian-ness.. But, lets say I work on a PC, and I want my files to open in a Mac ( which is apparently reverse endianess to PC ).. [ I am happy to assume that floats are 32-bit on my targets.. and that they're represented the same... although, is that even a safe assumption on Windows, Mac, Linux? See I really don't want to have to resort to a dodgy handrolled file representation for non-integers, and using a text representation for floating point arrays never really appealled to me. ]

Anyway, assuming that the representation is the same.. if I do this:

int main( void )
{
  std::ofstream out( "test.bin", std::ios::binary | std::ios::out | std::ios::trunc );
  const float f1 = 123.456f;
  out.write( reinterpret_cast< const char * >( &f1 ), sizeof( float ) );
  out.close( );

  std::ifstream in( "test.bin", std::ios::binary | std::ios::in );
  float f2 = 0.0f;
  in.read( reinterpret_cast< char * >( &f2 ), sizeof( float ) );
  in.close( );

  assert( f1 == f2 );
}

Will it work ( i.e. will the read value be equal the written value ) if the second half of that process is done on a reverse-endian machine? Or is it better ( or no change atall ) to do this?

int main( void )
{
  std::ofstream out( "test.bin", std::ios::binary | std::ios::out | std::ios::trunc );
  const float f1 = 123.456f;
  const char * c1 = reinterpret_cast< const char * >( &f1 );
  for( size_t i = 0; i < sizeof( float ); ++i ) {
    out.put( c1[ i ] );
  }
  out.close( );

  std::ifstream in( "test.bin", std::ios::binary | std::ios::in );
  float f2 = 0.0f;
  char * c2 = reinterpret_cast< char * >( &f2 );
  for( size_t i = 0; i < sizeof( float ); ++i ) {
    c2[ i ] = in.get( );
  }
  in.close( );

  assert( f1 == f2 );
}

Or, is it necessary to reverse the order of the bits in each byte read if it is determined that the machine reading the file is 'backwards' ?

Edward's rule of thumb for binary I/O is that if you use read() and write(), the file is not portable. If you want portable binary I/O you need to deconstruct the object and write the bytes manually, then read them back in the same order they were written and reconstruct the object manually. That way the result is the same regardless of what system you process the file on.

The manual approach is almost always too involved to be worth it, so for portability, a standardized text format is the way to go.

I'm aware of potential issues involved when directly writing struct values to binary, so I already break the objects into primitives, but the smallest primitive I can break down to is a float, I'm happy to do this, it makes a certain amount of sense in a context which is basically, a load of arrays of float and uint32_t, with only a very tiny amount of considered structure ( a 16 byte header for each group of arrays, which may be hundreds/thousands of elements long ). The data is certainly not easily human-readable/writeable, so I wouldn't gain that usual advantage of a text format, only the portability. In any other situation though, I'd certainly go for a text format.

All I really need is a ( somewhat ) platform independant way of storing individual floats in binary format, I say somewhat platform independant, since PC and Mac are the only targets I'm focusing on.

But, based on what you've said, I guess that the second piece of code I posted would work ( reading and writing each byte of each float one at a time using get/put, and always reconstructing in the same order ), as long as floats are 32-bit, and are represented in the same way at bit-level on the target platforms...

But, based on what you've said, I guess that the second piece of code I posted would work

No, but that's Edward's fault for not being detailed enough. What you're doing with put() and get() is exactly what write() and read() do, and it has the same problems. By manually deconstructing and reconstructing the objects, Ed meant using bitwise operators:

#include <iostream>

float ToFloat(const unsigned char *bytes)
{
  unsigned long temp = 0;

  temp |= (bytes[0] & 0xFF) << 24;
  temp |= (bytes[1] & 0xFF) << 16;
  temp |= (bytes[2] & 0xFF) << 8;
  temp |= (bytes[3] & 0xFF) << 0;

  return *reinterpret_cast<float*>(&temp);
}

unsigned char *FromFloat(float value, unsigned char *bytes)
{
  unsigned long *temp = reinterpret_cast<unsigned long*>(&value);

  bytes[0] = static_cast<unsigned char>((*temp >> 24) & 0xFF);
  bytes[1] = static_cast<unsigned char>((*temp >> 16) & 0xFF);
  bytes[2] = static_cast<unsigned char>((*temp >> 8) & 0xFF);
  bytes[3] = static_cast<unsigned char>((*temp >> 0) & 0xFF);

  return bytes;
}

int main()
{
  using namespace std;

  float value = 247.548f;
  unsigned char bytes[4];

  cout << fixed << ToFloat(FromFloat(value, bytes)) << '\n';
}

That works because the shift operator does the right thing to shift the bytes into the correct location regardless of the byte order for the system. It's still not 100% portable because of size and floating point format assumptions, but at least the endianness issue is covered. :)

- How does endian-ness work? If two bytes A and B are written ( in that order ) into a file using put, then read on a machine with reversed endian-ness, do the bytes come out in a different order ( i.e. B, A ), or do they come out in the order A, B, but with the actual bits themselves being backwards?

It's all about the interpretation of the bytes when loaded into memory that matters, not how they're stored. If the first machine writes A,B and the second machine reads A,B, the values will be interpreted differently because to the first machine A,B means A,B, but to the second machine A,B means B,A.

No, it's about how they are stored. The bytes will be backwards in the file when written by a big-endian system but read by a little-endian system.

Matt, see this

The bytes will be backwards in the file when written by a big-endian system but read by a little-endian system.

That's another way of saying the same thing. :)

That's another way of saying the same thing. :)

No really. You said "It's all about the interpretation of the bytes when loaded into memory that matters, not how they're stored." I'm saying you must know how they are stored or you can't properly read the data.

I think I know what you were trying to say, but it wasn't clear.

I think I know what you were trying to say, but it wasn't clear.

Sorry for not being clear.

Thank-you both, I read through that wikipedia article ( again =p ) and also this page from IBM http://www.ibm.com/developerworks/aix/library/au-endianc/index.html?ca=drs-, and I get it now. All I need to decide now is whether I want to force one endian-ness in the file format, or do something at the beginning of the file to indicate endian-ness. I can live with the 32-bit float assumption.. at least until I find an un-ignorable platform where there isn't a 32-bit floating point type...

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.