Binary File Input Bug

Question

Labdabeta 182 Posting Pro in Training

10 Years Ago

Hello,

I have a bug in my program somewhere and I cannot understand why. The program merely prints data from a binary file. Here is the code:

#include <iostream>
#include <fstream>
#include <vector>
#include <cstdint>
using namespace std;
int main(int argc, char *argv[])
{
    if (argc!=2)
    {
        cout<<"Invalid argument count, please provide a file name."<<endl;
        return 0;
    }
    fstream file(argv[1],ios_base::binary|ios_base::in);
    vector<int32_t> data;
    int32_t tmp;
    char buf;
    while (!file.eof())
    {//complicated i/o to ignore endianness
        file.read(&buf,1);
        tmp=buf<<24;
        file.read(&buf,1);
        tmp|=buf<<16;
        file.read(&buf,1);
        tmp|=buf<<8;
        file.read(&buf,1);
        tmp|=buf;
        data.push_back(tmp);
    }
    for (size_t i=0; i<data.size(); ++i)
        cout<<hex<<data[i]<<endl;
    return 0;
}

Most of the time it works fine. However, sometimes it gives the wrong output. For example, when my test file contained (as read by a hex editor):

6A0EFFFF 
8A0C0002 
5CC00000 
00000007 
8E0E0001 
4A0E0000 
100A0000 
8E0EFFFF 
6F0E0000 
8F0E0000 
4F0E0001 
100A0000

The program spat out:

ffffffff
8a0c0002
ffc00000
7
8e0e0001
4a0e0000
100a0000
ffffffff
6f0e0000
8f0e0000
4f0e0001
100a0000
0

Why is this?

c++ ide

2 Contributors
8 Replies
410 Views
1 Hour Discussion Span
Latest Post 10 Years Ago Latest Post by Labdabeta

Ancient Dragon 5,243 Achieved Level 70

10 Years Ago

you can't shift an 8-bit char by 24 bits (line 20). What are you trying to accompolish? Why do any shifting at all? If you're attempting to read a 4-byte integer, why not just do it the simple way

int num = 0;
file.read((char *)&num,sizeof(int));

Ancient Dragon 5,243 Achieved Level 70

10 Years Ago

When I do it your way I get everything inverted

I wasn't aware that endian-ness was the problem you want to resolve. My way is only useful if that isn't an issue.

Here's an article that, I think, shows how to determine the endian-ness of an operating system.

Here is a discussion about your problem. Pay attention to the function in the last post. All that's neccessary is to swap the bytes of the integer, not the bits in each byte.

consider the 4 byte hexidecimal number 0xbebafeca. on a big endian machine this would be stored in contigiuos memory as [be][ba][fe][ca], whereas on a little endian machine the format would be [ca][fe][ba][be]. viewed as an array of unsigned char, to reverse the byte order you would need simply to swap the first element of the array with the last and the second with the third.

Edited 10 Years Ago by Ancient Dragon

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Labdabeta 182 Posting Pro in Training Featured Poster · Answer 1 · 2014-04-17T18:34:06+00:00

The reason I used character shifting is because that is the only way for me to ensure correct endian-ness. When I do it your way I get everything inverted, which isn't helpful. Here is the code you are talking about:

#include <iostream>
#include <fstream>
#include <vector>
#include <cstdint>
using namespace std;
int main(int argc, char *argv[])
{
    if (argc!=2)
    {
        cout<<"Invalid argument count, please provide a file name."<<endl;
        return 0;
    }
    fstream file(argv[1],ios_base::binary|ios_base::in);
    vector<int32_t> data;
    int32_t tmp;
    while (!file.eof())
    {
        file.read((char*)&tmp,sizeof(tmp));
        data.push_back(tmp);
    }
    for (size_t i=0; i<data.size(); ++i)
        cout<<hex<<data[i]<<endl;
    return 0;
}

And given the same input file, it printed:

ffff0e6a
2000c8a
c05c
7000000
1000e8e
e4a
a10
ffff0e8e
e6f
e8f
1000e4f
a10
a10

Which is correct except for being reversed endian-wise and duplicating the last element.

Labdabeta 182 Posting Pro in Training Featured Poster · Answer 2 · 2014-04-17T19:08:03+00:00

The file is written by another program of mine (its assembled machine language, this program is an emulator of a CPU). I know for a fact that I want Byte1, then Byte2, etc... when I open the file with HxD it looks perfect, but when I open it as above it gets reversed. This is fixed if I extract it byte-by-byte, but then I run into the issue of occasionally getting incorrect bytes. I tried casting the char to an int for the shifts, but it didnt help. I don't understand why my algorithm wouldn't work.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 3 · 2014-04-17T19:23:52+00:00

So, all you have to do is to read 4 bytes then reverse them. The whole problem of endian-ness disappears if you write out the file as plain text instead of binary.

#include <iostream>
#include <fstream>
#include <vector>
#include <cstdint>
using namespace std;
int main(int argc, char *argv[])
{
    if (argc!=2)
    {
        cout<<"Invalid argument count, please provide a file name."<<endl;
        return 0;
    }
    fstream file(argv[1],ios_base::binary|ios_base::in);
    vector<int32_t> data;
    int32_t tmp;
    char buf[4];
    while (!file.eof())
    {//complicated i/o to ignore endianness
        file.read(&buf[3],1);
        file.read(&buf[2],1);
        file.read(&buf[1],1);
        file.read(&buf[0],1);
        temp = *(int32_t*)buf;
        data.push_back(temp);
    }
    for (size_t i=0; i<data.size(); ++i)
        cout<<hex<<data[i]<<endl;
    return 0;
}

Labdabeta 182 Posting Pro in Training Featured Poster · Answer 4 · 2014-04-17T19:30:30+00:00

So, all you have to do is to read 4 bytes then reverse them.

Which is exactly what I was doing with my char shifts, and it works about 90% of the time... what makes no sense however is why it fails 10% of the time. The first code I posted should work, but doesn't. I can't figure out why.

write out the file as plain text

Sadly this will not work, my program will not be the only one creating the files, and other assemblers already use binary formatting.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 5 · 2014-04-17T19:34:11+00:00

There is no reason to do all that shifting. Forget shifting, it doesn't do you any good.

Labdabeta 182 Posting Pro in Training Featured Poster · Answer 6 · 2014-04-17T19:46:13+00:00

You are right, but the shift is necessary to get the bytes in place (unless I use a union).

I figured out the problem. When I encounter a negative number, it is stored in the character as, for example, 0xFF (-1). However, when it is implicitly casted to int, it doesn't retain the value of 0xFF, but rather 0xFFFFFFFF (-1) because of C++'s smart casting system. Using an unsigned char resolved the issue.

Thank you for the help though.