Hello,

I have a bug in my program somewhere and I cannot understand why. The program merely prints data from a binary file. Here is the code:

#include <iostream>
#include <fstream>
#include <vector>
#include <cstdint>
using namespace std;
int main(int argc, char *argv[])
{
    if (argc!=2)
    {
        cout<<"Invalid argument count, please provide a file name."<<endl;
        return 0;
    }
    fstream file(argv[1],ios_base::binary|ios_base::in);
    vector<int32_t> data;
    int32_t tmp;
    char buf;
    while (!file.eof())
    {//complicated i/o to ignore endianness
        file.read(&buf,1);
        tmp=buf<<24;
        file.read(&buf,1);
        tmp|=buf<<16;
        file.read(&buf,1);
        tmp|=buf<<8;
        file.read(&buf,1);
        tmp|=buf;
        data.push_back(tmp);
    }
    for (size_t i=0; i<data.size(); ++i)
        cout<<hex<<data[i]<<endl;
    return 0;
}

Most of the time it works fine. However, sometimes it gives the wrong output. For example, when my test file contained (as read by a hex editor):

6A0EFFFF 
8A0C0002 
5CC00000 
00000007 
8E0E0001 
4A0E0000 
100A0000 
8E0EFFFF 
6F0E0000 
8F0E0000 
4F0E0001 
100A0000

The program spat out:

ffffffff
8a0c0002
ffc00000
7
8e0e0001
4a0e0000
100a0000
ffffffff
6f0e0000
8f0e0000
4f0e0001
100a0000
0

Why is this?

you can't shift an 8-bit char by 24 bits (line 20). What are you trying to accompolish? Why do any shifting at all? If you're attempting to read a 4-byte integer, why not just do it the simple way

int num = 0;
file.read((char *)&num,sizeof(int));

The reason I used character shifting is because that is the only way for me to ensure correct endian-ness. When I do it your way I get everything inverted, which isn't helpful. Here is the code you are talking about:

#include <iostream>
#include <fstream>
#include <vector>
#include <cstdint>
using namespace std;
int main(int argc, char *argv[])
{
    if (argc!=2)
    {
        cout<<"Invalid argument count, please provide a file name."<<endl;
        return 0;
    }
    fstream file(argv[1],ios_base::binary|ios_base::in);
    vector<int32_t> data;
    int32_t tmp;
    while (!file.eof())
    {
        file.read((char*)&tmp,sizeof(tmp));
        data.push_back(tmp);
    }
    for (size_t i=0; i<data.size(); ++i)
        cout<<hex<<data[i]<<endl;
    return 0;
}

And given the same input file, it printed:

ffff0e6a
2000c8a
c05c
7000000
1000e8e
e4a
a10
ffff0e8e
e6f
e8f
1000e4f
a10
a10

Which is correct except for being reversed endian-wise and duplicating the last element.

When I do it your way I get everything inverted

I wasn't aware that endian-ness was the problem you want to resolve. My way is only useful if that isn't an issue.

Here's an article that, I think, shows how to determine the endian-ness of an operating system.

Here is a discussion about your problem. Pay attention to the function in the last post. All that's neccessary is to swap the bytes of the integer, not the bits in each byte.

consider the 4 byte hexidecimal number 0xbebafeca. on a big endian machine this would be stored in contigiuos memory as [be][ba][fe][ca], whereas on a little endian machine the format would be [ca][fe][ba][be]. viewed as an array of unsigned char, to reverse the byte order you would need simply to swap the first element of the array with the last and the second with the third.

Edited 2 Years Ago by Ancient Dragon

The file is written by another program of mine (its assembled machine language, this program is an emulator of a CPU). I know for a fact that I want Byte1, then Byte2, etc... when I open the file with HxD it looks perfect, but when I open it as above it gets reversed. This is fixed if I extract it byte-by-byte, but then I run into the issue of occasionally getting incorrect bytes. I tried casting the char to an int for the shifts, but it didnt help. I don't understand why my algorithm wouldn't work.

So, all you have to do is to read 4 bytes then reverse them. The whole problem of endian-ness disappears if you write out the file as plain text instead of binary.

#include <iostream>
#include <fstream>
#include <vector>
#include <cstdint>
using namespace std;
int main(int argc, char *argv[])
{
    if (argc!=2)
    {
        cout<<"Invalid argument count, please provide a file name."<<endl;
        return 0;
    }
    fstream file(argv[1],ios_base::binary|ios_base::in);
    vector<int32_t> data;
    int32_t tmp;
    char buf[4];
    while (!file.eof())
    {//complicated i/o to ignore endianness
        file.read(&buf[3],1);
        file.read(&buf[2],1);
        file.read(&buf[1],1);
        file.read(&buf[0],1);
        temp = *(int32_t*)buf;
        data.push_back(temp);
    }
    for (size_t i=0; i<data.size(); ++i)
        cout<<hex<<data[i]<<endl;
    return 0;
}

Edited 2 Years Ago by Ancient Dragon

So, all you have to do is to read 4 bytes then reverse them.

Which is exactly what I was doing with my char shifts, and it works about 90% of the time... what makes no sense however is why it fails 10% of the time. The first code I posted should work, but doesn't. I can't figure out why.

write out the file as plain text

Sadly this will not work, my program will not be the only one creating the files, and other assemblers already use binary formatting.

You are right, but the shift is necessary to get the bytes in place (unless I use a union).

I figured out the problem. When I encounter a negative number, it is stored in the character as, for example, 0xFF (-1). However, when it is implicitly casted to int, it doesn't retain the value of 0xFF, but rather 0xFFFFFFFF (-1) because of C++'s smart casting system. Using an unsigned char resolved the issue.

Thank you for the help though.

This question has already been answered. Start a new discussion instead.