Binary I/O

hondros 0 Tallied Votes 364 Views Share

Alright , so I spent _forever_ trying to figure this out. So, if any newbies here need to know how to write and read ANY object (with the exclusion of pointers, but I'll get to that), this is something that can help.

First and foremost, you have to take endianness into account whenever you are working with binary data. We simply use a #define to explicitly state to write with big or little (1 being big, 0 being little).

The code should be fairly straightforward, and what is wonderful is you don't have to explicitly state what the types are thanks to templates. So the following code would work

You should be able to figure out writing pointers/arrays ;) If not, you're not ready to be saving them anyways :P

Also, you should try to roll your own functions after trying this out. I didn't know how to use Unions (hell I didn't even know what they were), and my pointer/dereference was a bit rusty. Also, it helped me learn how memory is mapped better.


This probably won't save complete class objects correctly, you might want to roll your own class function for saving and loading them. If you need to get a head start, I can post my class's functions for you, but you should try to figure it out on your own.

EDIT: Shoot, I forgot to put everything in a main function. Oh well, you should be able to fix that.

#include <fstream>

#define ENDIAN 1 // x86 is Little-Endian, I think
#define FILENAME "Test.bin"

template <class T>
void writeObj(std::ofstream *FILE, T obj) {
    union U {
        char b[sizeof(T)];
        T real;
    };
    union U buffer;
    buffer.real = obj;

    if (ENDIAN == 0)
        for (int x=sizeof(T)-1; x>=0; x--)
            FILE->write(&(buffer.b[x]), sizeof(buffer.b[x]));

    if (ENDIAN == 1)
        for (int x=0; x<sizeof(T); x++)
            FILE->write(&(buffer.b[x]), sizeof(buffer.b[x]));
}

template <class T>
void readObj(std::ifstream *FILE, T *obj) {
    union U {
        char b[sizeof(T)];
        T real;
    };
    union U buffer;

    if (ENDIAN == 0){
        for (int x=sizeof(T)-1; x>=0; x--) 
            FILE->read(&(buffer.b[x]), sizeof(buffer.b[x]));
    };

    if (ENDIAN == 1) {
        for (int x=0; x <sizeof(T); x++) 
            FILE->read(&(buffer.b[x]), sizeof(buffer.b[x]));
    };

    *obj = buffer.real;
}

int a = 1;
int a2;
char b = 3;
char b2;
float c = 45.2;
float c2;
std::ofstream fileOut(FILENAME, std::ios::out|std::ios::binary);

writeObj(&fileOut, a); 
writeObj(&fileOut, b);
writeObj(&fileOut, c);

readObj(&fileOut, &a2);
readObj(&fileOut, &b2);
readObj(&fileOut, &c2);

printf("%d\n%d\n%.2f", a2, b2, c2);
mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster

That's very nice. It's good to remind people that binary IO does imply worrying about endian-ness.

I do have a few additional caveats and improvements to suggest.

1) On C++ basics, there is no reason to use pointers in this case, passing-by-reference would be cleaner and safer.

2) Your use of the std::ofstream and std::ifstream classes is a bit problematic. The first problem is that you obviously don't need the stream to be a file-stream, so you should use the more general classes of std::ostream and std::istream . The second problem is that your read/write functions are assuming that the stream in question is prepared for binary IO (i.e. opened with the ios::binary flag), this is the kind of "externalized responsibility" that can lead to robustness problems (i.e. you cannot guarantee that the execution of your read/write functions is predictable, because it depends on an external assumption). C++ solves this problem with classes that allow you to protect your invariants, and allow you to not make any assumptions that could be dangerous.

3) A simple issue is your use of sizeof(buffer.b[x]) , this is OK, but is also guaranteed by the C++ Standard to always output 1, so you don't need to use sizeof() there.

4) You should not rely on a in-code definition of your ENDIAN pre-processor flag. Most compilers have standard pre-processor flags to indicate the endian-ness of the environment for which the code is being compiled. For example, for compliance with both GCC and MSVC, you can do:

#ifdef __GNUC__
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define ENV_IS_LITTLE_ENDIAN
#endif
#endif

#ifdef _MSC_VER
//Microsoft works only for Windows and Windows is only little-endian:
#define ENV_IS_LITTLE_ENDIAN
#endif

5) The above leads me to the other issue, which is that the difference in endian-ness is known at compile-time, in fact, it is known at preprocessing-time, so you should use a pre-processor conditional to switch your implementations, not a regular conditional, for example:

#ifndef ENV_IS_LITTLE_ENDIAN
    for (int x=sizeof(T)-1; x>=0; x--)
        FILE->write(&(buffer.b[x]), sizeof(buffer.b[x]));
#else
    for (int x=0; x<sizeof(T); x++)
        FILE->write(&(buffer.b[x]), sizeof(buffer.b[x]));
#endif

6) You don't really need to name your union types.

7) You actually have an error in your code, that is, you give the "fileOut" object to the readObj function, that's obviously a simple mistake.

8) It is important to mention and deal with the fact that your read/write function can only work for built-in types. Of course, any non-POD class type will fail upon construction of the union, because non-POD types are not allowed inside a union (a POD type is a class or built-in type that has no constructor, no copy-constructor, no copy-assignment operator and no destructor). Basically, any non-trivial class type will not work with your read/write functions, and that's a good thing.

The real problem is that a POD type will work with your read/write functions even if it is more than just a built-in type (int, float, double, etc.). For example, a class like struct Vect2D { int x, y; }; is a POD-type, and thus, is allowed to be part of a union, so your read/write functions will work with it. BUT, you will screw up the whole endian-ness business with that, because using your write function on a big endian system and then reading the file back from a little endian system will result in an inversion of the (x,y) coordinates. So, you need to make sure that your functions are not usable with a type that is not a fundamental built-in type. It just so happens that you can achieve exactly this with Boost libraries:

#include <boost/type_traits.hpp>
#include <boost/utility.hpp>

template <class T>
typename boost::enable_if_c< boost::is_fundamental<T>::value,
void >::type writeObj(std::ostream& out, T obj) {
  //..
};

template <class T>
typename boost::enable_if_c< boost::is_fundamental<T>::value,
void >::type readObj(std::istream& in, T& obj) {
  //..
};

So, implementing all these recommendations, you get:

#include <fstream>

#ifdef __GNUC__
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define ENV_IS_LITTLE_ENDIAN
#endif
#endif

#ifdef _MSC_VER
//Microsoft works only for Windows and Windows is only little-endian:
#define ENV_IS_LITTLE_ENDIAN
#endif


class binary_file_writer {
  private:
    std::ofstream file_out;
  public:
    binary_file_writer(const char* aFilename) : 
                       file_out(aFilename, std::ios::out | std::ios::binary) { };
    
    template <typename T>
    friend
    boost::enable_if_c< boost::is_fundamental<T>::value,
    binary_file_writer& operator << (binary_file_writer& lhs, T rhs) {
      union {
        char b[sizeof(T)];
        T real;
      } buffer;

      buffer.real = rhs;

#ifndef ENV_IS_LITTLE_ENDIAN
      for (int x = sizeof(T) - 1; x >= 0; --x)
        file_out.write(&(buffer.b[x]), 1);
#else
      for (int x = 0; x < sizeof(T); ++x)
        file_out.write(&(buffer.b[x]), 1);
#endif
      return *this;
    };
};

class binary_file_reader {
  private:
    std::ifstream file_in;
  public:
    binary_file_reader(const char* aFilename) : 
                       file_in(aFilename, std::ios::in | std::ios::binary) { };
    
    template <typename T>
    friend
    boost::enable_if_c< boost::is_fundamental<T>::value,
    binary_file_reader& operator << (binary_file_reader& lhs, T& rhs) {
      union {
        char b[sizeof(T)];
        T real;
      } buffer;

#ifndef ENV_IS_LITTLE_ENDIAN
      for (int x = sizeof(T) - 1; x >= 0; --x) 
        file_in.read(&(buffer.b[x]), 1);
#else
      for (int x = 0; x < sizeof(T); ++x) 
        file_in.read(&(buffer.b[x]), 1);
#endif
      rhs = buffer.real;
      return *this;
    };
};

int main() {

  int a = 1;
  int a2;
  char b = 3;
  char b2;
  float c = 45.2;
  float c2;
  
  {
  binary_file_writer fileOut("Test.bin");
  fileOut << a; 
  fileOut << b;
  fileOut << c;
  };

  {
  binary_file_reader fileIn("Test.bin");
  fileIn >> a2;
  fileIn >> b2;
  fileIn >> c2;
  };

  std::cout << a2 << " " << b2 << " " << c2 << std::endl;
  return 0;
};
hondros 25 Junior Poster

Thank you for critiquing the code, not too often people do that.
1) The only place I used pointers for was with the reader, which I believe you did as well? If not, can you clarify? This was the only way I could get the reader to work correctly in my code
2) I actually just learned file I/O as a whole, so I really didn't know about any of those classes or anything. Thank you for pointing that out :D
3) I figured the standard size of char is 1, so I probably should've used that, but I forgot about it. Again, thank you for pointing that out.
4) This is where I get confused. Because what if I want to use my file on a separate computer that has a different endianness? Say this one is Big, and the other one is Small. If I were to save the file as Big on one, and open it up as Big, it's fine. How do I keep it cross-compatible, if that's at all possible?
5) See above, I think.
6) I like naming my union types :(
7) Ah yes, I did not notice that. I was copying and pasting from my project, and modified it a bit, so that would be the reason.
8) I already know it doesn't support non-POD types. For my purposes, I don't have a need to contain those right now, and as such I didn't research it. However, I do have a class (heightmap) which has two properties: writeToDisk, and readFromDisk, which employ my Binary write and read functions. Otherwise known as "serialization", correct?

I apologize if I'm coming off as rude or anything, I do appreciate you looking at the code and whatnot :D

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster

>>1) The only place I used pointers for was with the reader, which I believe you did as well? If not, can you clarify? This was the only way I could get the reader to work correctly in my code

I used references instead. References are kinda like pointers, but they are not, and are better. References are like variable aliases, i.e. another identifier to denote the same variable, and as such, they lead to a cleaner syntax (no need to take the address and dereference all the time) and they are not reseatable which make them less error-prone. So, no using pointers is not the only way to get the reader to work correctly, using references is the preferred way.

>>Because what if I want to use my file on a separate computer that has a different endianness?

Well, you know what your code does, don't you? Your code essentially makes sure that all data is written to a file in little-endian format. Your write function writes big-endian words in reverse, meaning it is written in little-endian format to the file. Then, your read function reads the little-endian format from the file and reverses it if the system has big-endian. So, that's the point, and that's how you are supposed to deal with endian-ness, that is, you choose one endian format for the binary file and do the conversion on write or on read or both if the system is not in the same endian format.

Right now, your code always saves the data as little-endian, and that's good because that's predictable. All standard binary file-formats specify the endian-ness. It's during the read/write operations that you convert from the computer architecture's endian-ness to the file-format's endian-ness, that's what your code does.

>>I already know it doesn't support non-POD types.

You know, somebody else might not, that's what robustness means. All I was pointing out was that, whenever possible, you should make incorrect code impossible to compile. The weakness in your code was that classes with trivial constructor/destructors (POD-types) would compile with your read/write functions, but such code is incorrect, and a mechanism to block the compilation of functions if used incorrectly is a very good thing to have.

>>However, I do have a class (heightmap) which has two properties: writeToDisk, and readFromDisk, which employ my Binary write and read functions. Otherwise known as "serialization", correct?

Yes. That's the basic form of serialization. I suggest you look at Boost.Serialization library (if you haven't done so already) for a more serious treatment of serialization. And I also recommend you use it or get inspired by it for your own purposes because it's very well constructed (much more flexible than committing yourself to one format (like binary)).

hondros 25 Junior Poster

So how exactly do you create references? I thought that was what I was doing, was passing references using (&variable) to the function? That's what is a bit confusing to me.

So am I doing the endian-ness stuff right then? By explicitly defining the endian-ness?

Wait, you can actually create code that causes the compile to fail if not used correctly? I've been trying forever to find something like that for months. How would I go about doing that? I think I might have an idea. Say you have an if...else block. If one of the values doesn't agree, then that block would never compile, correct?

I will take a look at that. I think once I actually completely understand what I'm doing with the code, I'll use a library. But if I understand it enough to do that, I might as well write my own code. xD

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster

>>So how exactly do you create references?

This will explain it all. Here is a simple example:

#include <iostream>

int main() {
  int a = 69;
  int& ar = a; //declares a reference named 'ar' which is an alias for variable 'a'.
  ar = 42; //assigns the value 42 to 'ar' (which is also 'a').
  std::cout << "the value of a is: " << a << std::endl; //this will print 42
  int* ap = &a; //declares a pointer named 'ap' which stores the address of variable 'a'.
  *ap = 69; //dererferences the pointer 'ap' and stores the value 69 at that variable.
  std::cout << "the value of a is: " << a << std::endl; //this will print 69
  return 0;
};

>>So am I doing the endian-ness stuff right then?

Yes, you are storing the values as little-endian in the file, regardless of the endian-ness of the system. It makes the file-format portable across systems with different endian-ness.

>>Wait, you can actually create code that causes the compile to fail if not used correctly?

Yes, that's the preferred form of error (as opposed to a run-time error). To cause an error at compile-time, the only thing you need is a way to know, at compile-time, whether something is correct or not. Most techniques to achieve this are based on template meta-programming. Often, what you do is that when you create a template (function or class) you can control the instantiation of the template such that it would fail at compile-time if the given template-arguments are not valid. The simplest such technique is the BOOST_STATIC_ASSERT macro which works like the assert() function but causes compilation to fail if the given compile-time condition is false. Another popular technique is the boost::enable_if template which can remove a function template from the list of overloaded functions when a given compile-time condition is not met, this technique is based on Sfinae (Substitution Failure Is Not An Error). So, in the example I put before, the read/write function templates are only enabled if the given type T is a fundamental type (built-in, like int or double), otherwise, the compiler will eliminate it from the list of function overloads and will probably give an error like "function readObj with parameter of type 'MyClass' was not found!".

>>How would I go about doing that?

First, you need to find the reason why something should fail to compile (like in the example, it should fail if the type T is not a fundamental type). Then, you need a way to compute, at compile-time, the value of that condition (like 'true' if it should compile and 'false' if it should not, in the case above I used boost::is_fundamental<T>::value ). Finally, you need a way to make the compilation fail, the classic trick is to create a static array of size 0. For example, this is a very basic version of a static-assert template:

template <bool>
struct static_assert {
  char c[0]; //this will cause a compilation error.
};

//Specialize the template for the 'true' case:
template <>
struct static_assert< true > { }; //this is an empty class, which compiles fine.

If you want to know more about this subject, I suggest you read Alexandrescu's book called "Modern C++ Design: Generic Programming and Design Patterns Applied" (2001) for understanding the basic template-based tricks, then both C++ template books are pretty good too, one is by David Abrahams and one by David Vandervoorde, which are both gurus in the world of template meta-programming.

>>I think I might have an idea. Say you have an if...else block. If one of the values doesn't agree, then that block would never compile, correct?

No. That's not going to do it. You have to forget the idea of using normal if-else statements, these cannot control compilation failure, because the compiler will compile (or at least parse) both the if-case and the else-case blocks even if one of them never executes. So, if you make one branch a failure at compilation, the if-else statement will always fail to compile. To evaluate conditionals and so on at compile-time you need to rely on a compile-time mechanism. You cannot use the pre-processor because that's not at compile-time, it's before compilation occurs. You cannot use regular if-statements because that's a run-time mechanism (after compilation has finished). You need to use templates because templates are resolved by the compiler during compilation, so you can use that mechanism to direct the compiler towards a success or a failure depending on other things that are also known at compile-time (like constant values or types). And doing this is part of a field we call "template meta-programming" because it involves writing programs that execute during the compilation and can thus control the compilation to sort-of add functionality to the C++ compiler. In fact, this language is Turing-complete, and the Boost people have pretty much proven that, see how they have implemented the entire STL library as a compile-time library in the Boost.Meta-Programming library. Anyhow, template meta-programming is a huge topic, and I don't want you to rush into anything like that which could really pass way over your head.

>>But if I understand it enough to do that, I might as well write my own code. xD

It's definitely a good exercise and good practice to learn to write a good serialization library, that's why I said might want to just get "inspired" by the way the Boost.Serialization library is constructed, and then write your own. But if you just _need_ a serialization library than use the Boost one.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.