Much Ado About Bits

Question

Dave Sinkula 2,398 long time no c

17 Years Ago

Many years ago I was wandering through some SNMP code trying to port it to a Rabbit 2000. I'd guess that not many are familiar with this device -- it's an 8-bitter, and I had no operating system or dynamic memory. So there were several issues in addition to the Dynamic C dialect I had to use to program it.

I seem to remember that one of the things that came up was that the 9 most significant bits were needed to mask off a value. I was just beginning my swing toward writing portable code and this got me thinking: how would you create a compile-time mask for the top 9 bits of some integer?

Well, if you could have some macro MSB that produced just the one most significant bit, then it would be possible to create another macro that could slam 9 of them together.

#define TOP9 (MSB | (MSB >> 1) | (MSB >> 2) | (MSB >> 3) | (MSB >> 4) \
                  | (MSB >> 5) | (MSB >> 6) | (MSB >> 7) | (MSB >> 8))

So this started me poking around at various ways to define a macro MSB . Now you'd think this would be simple and obvious. Well, I guess it is, but along the way I found a lot of cracks and holes with some of the things I first tried.

Setting the High Bit of an Integer Type

So let’s say we want the high bit of an int; and of course when talking about bits we really mean the unsigned variety. It’s 0x80000000 , right? No. Well, yes and no, I suppose. It might be, if our int has 32 bits, but if an int has 16 bits or 64 bits it’s not right.

Aha! We can use our friend sizeof like this, right?

#define MSB  (1U << (sizeof msb * 8 - 1))

Nope. Here it is assumed that there are 8 bits in a byte. Which is good enough for most folks, but I have happened to work on platforms where CHAR_BIT is not 8.

So we use CHAR_BIT instead!

#define MSB  (1U << (sizeof msb * CHAR_BIT - 1))

Well, that’s good enough for most people, I suppose, but not for the pedant. The standard leaves open the possibility for holes. That’s right, unused padding bits may be part of an integer! So all the bits of the unsigned int may not be bits that contribute to the value. Another way to see this is that we may be setting a bit such that the value is bigger that INT_MAX .

Now why would those standard writers have done such a thing? Well, I can only speculate. Let’s remember that C was born in an environment in which 8-bit bytes were one of several possibilities; hardware used many different methods of being fast and useful before 8-bit became de facto standard (but not quite C or C++ standard). But let’s not put this in yesterday’s persepective, because all of us with our 20/20 hindsight will say that today’s garden-variety byte is so obvious. Instead, let’s pretend we’re working on the platform of tomorrow.

Example: A Made-Up Platform

Let’s say that in this world that everyone uses a GUI and graphics are key; cool things like 3D rendering are expected. So hardware guys come up with a processor that is floating-point oriented. This mythical platform uses an 80-bit floating point register. But the integers that use this register only occupy 64 of these 80 bits. And let’s assume that CHAR_BIT is 16. So sizeof(int) is 5, and the topmost 16 bits are padding. Can you now see some holes in the attempts thus far? All of them would fail to set only the high bit of the integer.

So is it even possible to get it right? Of course. And along the way of finding this out for myself, I also began to realize that C and C++ are more geared to values and their ranges than the sizes of the objects that contain them. Perhaps that’s why <limits.h> has all values and has very little to say about sizes, merely giving us CHAR_BIT .

How Can You Get Only The High Bit?

In fact, there are a number of ways to set only the high bit of an integer, and in general it deals with values. First, let’s take a look at the value 0; it has all bits zero. Then let’s take a look at ~0 ; it has all bits one. But since 0 has type int, there is a little bit of ugliness that rears its head: the result of the operator ~ has “implementation-defined and undefined aspects for signed types”. So let’s avoid that and go with 0U – hey an opportunity to use an integer suffix! But along the way I also found that if you subtract 1 from an unsigned zero, it rolls to the highest integer value – a value with all bits set. So this leads us to 0U – 1 , or slightly smaller but odd-looking –1U . This isn’t a negative unsigned value, it is unary negation that has the same effect as subtracting 1 from 0U .

So now we’ve got a couple of funny-looking values that set all bits. Continuing with the example, we’d have a 0U that looks something like this.

pppppppppppppppp0000000000000000000000000000000000000000000000000000000000000000

And our –1U would look like this.

pppppppppppppppp1111111111111111111111111111111111111111111111111111111111111111

Obviously we’ve got more than the high bit set, but if you’d right shift these bits by one, you would have this.

pppppppppppppppp0111111111111111111111111111111111111111111111111111111111111111

That looks like just the opposite of what we want. If we added one to this value, or inverted the bits, we would have this.

pppppppppppppppp1000000000000000000000000000000000000000000000000000000000000000

And that’s what we want.

Putting It All Together

So now it boils down to personal preference as much as anything, but there are just 3 key things to do in an expression to obtain this result.

Get a value with all value bits set to 1.
Shift a zero into the top bit and move the rest down.
Flip these bits.

There happen to be several ways to do each of these, so that deserves a little discussion as well. For the all-bits-one value, you can use the U[I]type[/I]_MAX macros in <limits.h> . Note that this is a little type-specific ( unsigned char , unsigned short , unsigned int , or unsigned long ). You can use a cast: (unsigned [I]type[/I])-1 . Again, this is a little type-specific. You can use the -1U or ~0U , but you can see that unsigned char , unsigned short , or unsigned long may feel a little left out. And for our friend unsigned char , which has a guarantee that all bits are value bits, we could even use 1 << (CHAR_BIT – 1) .

The shifting may be accomplished by either using a bitwise shift ( >>1 ) or by dividing by two. Flipping the bits may be done with the operator ~ or by adding 1. For those who might mention issues with the three possible integer representations – ones complement, twos complement, or sign-magnitude – remember that we’re dealing with unsigned integers.

So there are a couple of portable ways to set the high bit. Mix and match your favorite methods for each of the 3 steps! Here are a couple of snippets:

More Good News

And finally, note that since these constructs all result in compile-time constant values, even though it looks like a calculation is occuring, there is no run-time math going on: the bottom line is a value with only the most significant bit set.

#include <stdio.h>
#include <limits.h>
 
#define MSB  ((-1U >> 1) + 1)
#define TOP9 (MSB | (MSB >> 1) | (MSB >> 2) | (MSB >> 3) | (MSB >> 4) \
                  | (MSB >> 5) | (MSB >> 6) | (MSB >> 7) | (MSB >> 8))
 
int main(void)
{
   unsigned int msb = MSB, mask = TOP9;
   printf("msb  = 0x%X\n", msb);
   printf("mask = 0x%X\n", mask);
   return 0;
}
 
/* my output
msb  = 0x80000000
mask = 0xFF800000
*/

Going back to our TOP9 macro then, this string of expressions may look ugly to the compiler after the precompiler gets finished with it, but it all resolves into a nice little constant value calculated by the poor compiler at compile time. And on a garden-variety PC of today with a 32-bit int, this macro looks merely becomes this:

11111111100000000000000000000000

But if we took the same code and went to the strange little processor I used as an example, it should instead automagically change to this.

pppppppppppppppp1111111110000000000000000000000000000000000000000000000000000000

3 Contributors
3 Replies
163 Views
3 Years Discussion Span
Latest Post 14 Years Ago Latest Post by Dave Sinkula

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

PoppyViolet 0 Newbie Poster · Answer 1 · 2006-08-16T15:14:26+00:00

Wow! Thank you!

Now that's what i call a blog posting and a half!!!

davidrafter 0 Newbie Poster · Answer 2 · 2009-06-13T05:35:41+00:00

If you're worried about how many bits your integer has, you should just use stdint.h.

Dave Sinkula 2,398 long time no c Team Colleague · Answer 3 · 2009-09-08T09:33:58+00:00

If you're worried about how many bits your integer has, you should just use stdint.h.

I don't suppose you could expand on that? The optional types in stdint.h don't seem to be int or long , for example. There doesn't seem to be a macro such as INT_BIT . And what do you do where stdint.h is not available? ;)

And I'm really trying not to care how many bits are in an integral type as much as how to set the most significant value bit.