Many years ago I was wandering through some SNMP code trying to port it to a Rabbit 2000. I'd guess that not many are familiar with this device -- it's an 8-bitter, and I had no operating system or dynamic memory. So there were several issues in addition to the Dynamic C dialect I had to use to program it.

I seem to remember that one of the things that came up was that the 9 most significant bits were needed to mask off a value. I was just beginning my swing toward writing portable code and this got me thinking: how would you create a compile-time mask for the top 9 bits of some integer?

Well, if you could have some macro `MSB`

that produced just the one most significant bit, then it would be possible to create another macro that could slam 9 of them together.

```
#define TOP9 (MSB | (MSB >> 1) | (MSB >> 2) | (MSB >> 3) | (MSB >> 4) \
| (MSB >> 5) | (MSB >> 6) | (MSB >> 7) | (MSB >> 8))
```

So this started me poking around at various ways to define a macro `MSB`

. Now you'd think this would be simple and obvious. Well, I guess it is, but along the way I found a lot of cracks and holes with some of the things I first tried.

**Setting the High Bit of an Integer Type**

So let’s say we want the high bit of an int; and of course when talking about bits we really mean the unsigned variety. It’s `0x80000000`

, right? No. Well, yes and no, I suppose. It might be, if our int has 32 bits, but if an int has 16 bits or 64 bits it’s not right.

Aha! We can use our friend `sizeof`

like this, right?

`#define MSB (1U << (sizeof msb * 8 - 1))`

Nope. Here it is assumed that there are 8 bits in a byte. Which is good enough for most folks, but I have happened to work on platforms where `CHAR_BIT`

is not 8.

So we use `CHAR_BIT`

instead!

`#define MSB (1U << (sizeof msb * CHAR_BIT - 1))`

Well, that’s good enough for most people, I suppose, but not for the pedant. The standard leaves open the possibility for holes. That’s right, unused padding bits may be part of an integer! So all the bits of the `unsigned int`

may not be bits that contribute to the value. Another way to see this is that we may be setting a bit such that the value is bigger that `INT_MAX`

.

Now why would those standard writers have done such a thing? Well, I can only speculate. Let’s remember that C was born in an environment in which 8-bit bytes were one of several possibilities; hardware used many different methods of being fast and useful before 8-bit became de facto standard (but not quite C or C++ standard). But let’s not put this in yesterday’s persepective, because all of us with our 20/20 hindsight will say that today’s garden-variety byte is so obvious. Instead, let’s pretend we’re working on the platform of tomorrow.

**Example: A Made-Up Platform**

Let’s say that in this world that everyone uses a GUI and graphics are key; cool things like 3D rendering are expected. So hardware guys come up with a processor that is floating-point oriented. This mythical platform uses an 80-bit floating point register. But the integers that use this register only occupy 64 of these 80 bits. And let’s assume that `CHAR_BIT`

is 16. So `sizeof(int)`

is 5, and the topmost 16 bits are padding. Can you now see some holes in the attempts thus far? All of them would fail to set only the high bit of the integer.

So is it even possible to get it right? Of course. And along the way of finding this out for myself, I also began to realize that C and C++ are more geared to values and their ranges than the sizes of the objects that contain them. Perhaps that’s why `<limits.h>`

has all values and has very little to say about sizes, merely giving us `CHAR_BIT`

.

**How Can You Get Only The High Bit?**

In fact, there are a number of ways to set only the high bit of an integer, and in general it deals with values. First, let’s take a look at the value 0; it has all bits zero. Then let’s take a look at `~0`

; it has all bits one. But since 0 has type int, there is a little bit of ugliness that rears its head: the result of the operator `~`

has “implementation-defined and undefined aspects for signed types”. So let’s avoid that and go with `0U`

– hey an opportunity to use an integer suffix! But along the way I also found that if you subtract 1 from an unsigned zero, it rolls to the highest integer value – a value with all bits set. So this leads us to `0U – 1`

, or slightly smaller but odd-looking `–1U`

. This isn’t a negative unsigned value, it is unary negation that has the same effect as subtracting `1`

from `0U`

.

So now we’ve got a couple of funny-looking values that set all bits. Continuing with the example, we’d have a `0U`

that looks something like this.

`pppppppppppppppp0000000000000000000000000000000000000000000000000000000000000000`

And our –1U would look like this.

`pppppppppppppppp1111111111111111111111111111111111111111111111111111111111111111`

Obviously we’ve got more than the high bit set, but if you’d right shift these bits by one, you would have this.

`pppppppppppppppp0111111111111111111111111111111111111111111111111111111111111111`

That looks like just the opposite of what we want. If we added one to this value, or inverted the bits, we would have this.

`pppppppppppppppp1000000000000000000000000000000000000000000000000000000000000000`

And that’s what we want.

**Putting It All Together**

So now it boils down to personal preference as much as anything, but there are just 3 key things to do in an expression to obtain this result.

- Get a value with all value bits set to 1.
- Shift a zero into the top bit and move the rest down.
- Flip these bits.

There happen to be several ways to do each of these, so that deserves a little discussion as well. For the all-bits-one value, you can use the `U[I]type[/I]_MAX`

macros in `<limits.h>`

. Note that this is a little type-specific ( `unsigned char`

, `unsigned short`

, `unsigned int`

, or `unsigned long`

). You can use a cast: `(unsigned [I]type[/I])-1`

. Again, this is a little type-specific. You can use the `-1U`

or `~0U`

, but you can see that `unsigned char`

, `unsigned short`

, or `unsigned long`

may feel a little left out. And for our friend `unsigned char`

, which has a guarantee that all bits are value bits, we could even use `1 << (CHAR_BIT – 1)`

.

The shifting may be accomplished by either using a bitwise shift ( `>>1`

) or by dividing by two. Flipping the bits may be done with the operator `~`

or by adding 1. For those who might mention issues with the three possible integer representations – ones complement, twos complement, or sign-magnitude – remember that we’re dealing with unsigned integers.

So there are a couple of portable ways to set the high bit. Mix and match your favorite methods for each of the 3 steps! Here are a couple of snippets:

**More Good News**

And finally, note that since these constructs all result in compile-time constant values, even though it looks like a calculation is occuring, there is no run-time math going on: the bottom line is a value with only the most significant bit set.

```
#include <stdio.h>
#include <limits.h>
#define MSB ((-1U >> 1) + 1)
#define TOP9 (MSB | (MSB >> 1) | (MSB >> 2) | (MSB >> 3) | (MSB >> 4) \
| (MSB >> 5) | (MSB >> 6) | (MSB >> 7) | (MSB >> 8))
int main(void)
{
unsigned int msb = MSB, mask = TOP9;
printf("msb = 0x%X\n", msb);
printf("mask = 0x%X\n", mask);
return 0;
}
/* my output
msb = 0x80000000
mask = 0xFF800000
*/
```

Going back to our `TOP9`

macro then, this string of expressions may look ugly to the compiler after the precompiler gets finished with it, but it all resolves into a nice little constant value calculated by the poor compiler at compile time. And on a garden-variety PC of today with a 32-bit int, this macro looks merely becomes this:

`11111111100000000000000000000000`

But if we took the same code and went to the strange little processor I used as an example, it should instead automagically change to this.

`pppppppppppppppp1111111110000000000000000000000000000000000000000000000000000000`