Consider the following code:

#define msizeof(type) ((char*)(&type) - (char*)(&type - 1))

int main()
    int x;
    printf("%u %u\n", msizeof(x), sizeof(x));
    return 0;

The Above code when compiled with g++ compiles just fine and works well without any wanrings, while in gcc it gives the following warning: integer overflow in expression. Though the code works fine on running. This warning goes away when I change the macro to: #define msizeof(type) ((char*)(&type + 1) - (char*)(&type)).

My Questions:
1. What does the warning integer overflow in expression mean?
2. Why am I getting it?
3. Why does changing it to #define msizeof(type) ((char*)(&type + 1) - (char*)(&type)) remove the warning?
4. Why the warning doesn't occur in C++?

Thanks in Advance.

Edited by tapananand

2 Years
Discussion Span
Last Post by vijayan121

Try printf("%lu %lu\n", msizeof(x), sizeof(x));
Note that sizeof(x) returns a size_t value, which is a long integer, hence the error/warning.

Also, do note that it is Sunday morning here in the west. Don't expect answers immediately - most of us are probably still in bed!

Edited by rubberman


What does the warning integer overflow in expression mean?

It means that the operation goes outside of the valid range of the particular type of integer. For example, if you have an integer of type unsigned char, then the range is 0..255 which means that doing 2 - 4 is an "overflow" because it goes negative, and, of course, 253 + 5 is also an "overflow" because it goes beyond the max of 255.

Why am I getting it?

For one, the &type - 1 operation could overflow by making the pointer "negative". Pointers are, in general, just unsigned integers that are interpreted as address values, and therefore, producing a negative pointer is an overflow error.

Another issue is that if you are in a 64bit platform, then pointers are 8 bytes long, while integers (int) are typically 4 bytes long. I think that by default, in C, the operations are done in the default integer type int unless it is explicitely made otherwise. But I could be mistaken, I'm not super well versed in the soft typing rules of C. But when I generate the assembly listing for your program (after a few tricks to prevent optimizations), this is confirmed by the following assembly code:

leaq -8(%rbp), %rcx    // &x -> rcx (64bit reg.)
movq %rcx, %rsi        // rcx -> rsi (save it)
addq $-4, %rcx         // substracts 4 from &x
subq %rcx, %rsi        // does (&x - (&x - 1))
movl %esi, %r8d        // keep only 32 bits of
movl %r8d, %esi        //  the result.

In other words, at some point, your code causes an 8 byte (64bit) value to be assigned to a 4 byte value, which will overflow unless the value is small (which it is in this case).

Why does changing it to #define msizeof(type) ((char*)(&type + 1) - (char*)(&type)) remove the warning?

I'm not really sure about that one. But I do know that one way to remove the warning is to do the casts properly:

#include <stddef.h>

#define msizeof(type) ((unsigned int)((size_t)(char*)(&type) - (size_t)(char*)(&type - 1)))

Because this tells the compiler to use size_t as the integer type throughout the operations, and size_t is a type that is supposed to always have the same size as pointers, so, overflow dangers are avoided.

Why the warning doesn't occur in C++?

Because C++ has stronger type conversion rules and it will not make implicit narrowing conversions (like a 64 -> 32 bit integer conversion). C++ will conserve the biggest type needed for as long as you don't convert it yourself to a more narrow type. So, the casts that I added above to remove the warnings are casts that are not needed in C++ because it already picks the most appropriate type (which is "size_t") for the situation.

At least, this is my understanding of this. But this could be more complicated because it's a compile-time computation.


Why the warning doesn't occur in C++?

The IS does not require that undefined behaviour must be diagnosed.

The macro #define msizeof(type) ((char*)(&type) - (char*)(&type - 1))
engenders undefined behaviour when used this way:

int main()
    int x;

    // msizeof(x) attempts to evaluate &x - 1 
    // *** this engenders undefined behaviour
    printf("%u %u\n", msizeof(x), sizeof(x));  

When an expression that has integral type is added to or subtracted from a pointer ...
<elided for brevity>
... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

In contrast, changing the macro to: #define msizeof(type) ((char*)(&type + 1) - (char*)(&type))
which attempts to evaluate &x + 1 eschews undefined behaviour because of the special provision:

For the purposes of these operators, a pointer to a nonarray object behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

Votes + Comments
Makes a lot more sense!

Forget about my answer. Vijayan121's answer is definitely the right one, it makes a lot more sense than mine. I didn't know there was such as rule about pointer arithmetic in the standard, and I can barely believe that compilers manage to diagnose it! I guess the C++ compiler doesn't, though. Do you think there is any specific reason for that vijayan121?


guess the C++ compiler doesn't, though. Do you think there is any specific reason for that

There is an explanation in the C Rationale

Re. portability:

Although it strove to give programmers the opportunity to write truly portable programs, the Committee did not want to force programmers into writing portably, to preclude the use of C as a ``high-level assembler'': the ability to write machine-specific code is one of the strengths of C. It is this principle which largely motivates drawing the distinction between strictly conforming program and conforming program (§1.7).

A strictly conforming program is another term for a maximally portable program. The goal is to give the programmer a fighting chance to make powerful C programs that are also highly portable, without demeaning perfectly useful C programs that happen not to be portable. Thus the adverb strictly.

Re. Undefined behaviour:

The terms unspecified behavior, undefined behavior, and implementation-defined behavior are used to categorize the result of writing programs whose properties the Standard does not, or cannot, completely describe. The goal of adopting this categorization is to allow a certain variety among implementations which permits quality of implementation to be an active force in the marketplace as well as to allow certain popular extensions, without removing the cachet of conformance to the Standard.
Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.