Bits in a byte

Question

gnobber 0 Newbie Poster

16 Years Ago

I've read that the bits in a byte (in c++) are implementation or system dependent. What does that mean? Does it mean implementation of c++ or the processor architecture or some other thing?

And I've read that you should use sizeof to determine the size of a byte? Could you give an example of what might happen if, say, I developed a c++ program in one system and then ran it on another system with a different size of byte (without using sizeof)?

c++

7 Contributors
16 Replies
327 Views
1 Day Discussion Span
Latest Post 16 Years Ago Latest Post by DonB

All 16 Replies

tux4life 2,072 Postaholic

16 Years Ago

>And I've read that you should use sizeof to determine the size of a byte?
No, you read it wrong, a byte is a general term for any valid combination of eight bits (and a bit is a one or a zero)
('To determine the size of a byte' ?? A byte is just always eight bits, you cannot hesitate here)

An example of a byte would be: 00001010 (which is ten in decimal) sizeof is used to get the number of bytes occupied by any valid datatype (including structs, your own classes), for example [B]sizeof(int)[/B] will return the number of bytes an integer variable will take up in your computer's memory (on most implementations (this means not on all) this is 4, but as you already said, it can differ from the implementation)

>I've read that the bits in a byte (in c++) are implementation or system dependent. What does that mean?
The C++ standard does not define the exact number of bytes a datatype has to consist of, however it just simply states things like: an integer variable will always have the natural size suggested by the architecture of the execution environment (32bits = 4 bytes for a 32bit processor, 16bits = 2 bytes for a 16bit processor, etc...)
In other words: The C++ standard defines minimum requirements for them :) sizeof is an operator which is frequently used to ensure portability: at compile time the sizeof([B]<something>[/B]) 'instruction' will be replaced by a constant value. The portability you achieve using sizeof is of a major benefit because you don't have to hard-code each datatype's size and because it enhances portability ...

Hope this clarifies the whole thing!

Narue 5,707 Bad Cop

16 Years Ago

@OP:

I'm sorry to say it, but so far tux has been unintentionally misleading you. Allow me to clarify.

>I've read that the bits in a byte (in c++) are implementation or system dependent.
Yes.

>What does that mean?
It means that you can't rely on a byte always having eight bits.

>Does it mean implementation of c++ or the processor architecture or some other thing?
It's a processor architecture thing.

>And I've read that you should use sizeof to determine the size of a byte?
The size of a byte in C++ is guaranteed to be 1. This can be confusing to some people because the number of bits in a byte can vary, so the actual size of a byte on different systems can be different but the size reported by sizeof will always be 1. You can get the number of bits in a byte by including <climits> and using the CHAR_BIT macro:

#include <climits>
#include <iostream>

int main()
{
    // "byte" and "char" are equivalent terms in C++
    std::cout<<"sizeof char: "<< sizeof ( char ) <<'\n';
    std::cout<<"CHAR_BIT:    "<< CHAR_BIT <<'\n';
}

The output will be 1 and the size of a byte on your system (usually 8).

@tux:

>A byte is just always eight bits
Incorrect. "Byte" is an abstract term for the smallest addressable unit for your system. While more and more systems are gravitating toward the octet (eight bits, for the terminology-challenged) for a byte, it's by no means universal.

It's a common joke that you can point out a programmer who has only worked on PCs by asking what the size of a byte is. ;)

>International character sets
Have nothing to do with the size of a byte on the system. The OP is talking about a single byte, which in C++ is the char data type. So the question is not whether a character could be multiple bytes, but how CHAR_BIT could be 16 or 32 rather than 8. Please try not to confuse him any more.

>1 byte is enough for storing 256 possible combinations (2^8),
>applied to a character set this means that it can hold 256 >different characters (e.g: the ASCII character set)
ASCII is technically a 7-bit character set, so it can only hold 128 characters portably. There are several extensions of ASCII into the full eight bit range, but the upper range is not portable because you can get different results on different systems, or even the same system with different code page settings.

>Currently C++ doesn't support all Unicode characters, only the most common one...
C++ doesn't support Unicode at all, but wide and multi-character support is present, which can be used to some extent with Unicode.

tux4life commented: Excellent! +8

Dave Sinkula commented: Props. +20

Ancient Dragon commented: Excellent :) +36

Narue 5,707 Bad Cop

16 Years Ago

>I'm sure tux4life was just trying to help.
I didn't imply otherwise. I said "unintentionally misleading you" because he probably hasn't yet learned these nuances, which is quite understandable and there's nothing wrong with that. Now you've both learned something. ;)

>What do you mean about 7-bit being portable
>and 8-bit upper range not portable?
When ASCII was designed, only the first seven bits were required to hold the characters the designers wanted. To save on transmission costs (we're talking about the 1960's here) the 8th bit was left out of the specification and used as an optional parity bit for error checking if desired.

Instead of using the 8th bit as a parity bit, a bunch of people independently decided to extend ASCII to suit their own needs with regional and special characters. The problem now is that while ASCII itself is widely used (to the point of being adopted into Unicode), there are umpteen different versions of extended ASCII.

When I say something isn't portable, I mean that the same code won't work the same way everywhere. In this case, your output when printing the upper range of extended ASCII will vary.

>i'm assuming that to support a 32-bit unicode character using
>a 16-bit byte, it uses 2 bytes for the unicode character.
Bingo. That's precisely how multi-byte character sets work (including Unicode[1]), except with 8-bit bytes being more common. You might find it fun to study how UTF-32, UTF-16, and UTF-8 (the three most common Unicode encodings) work, because there are some interesting problems involved with implementing multi-byte character sets.

[1] I'm not being entirely accurate with terminology, but it shouldn't hurt.

VernonDozier commented: Good explanations in this thread. +16

jephthah 1,888 Posting Maven

16 Years Ago

i, too, believed the same thing Tux did.... thanks Narue for the detailed lesson. i learn a lot here.

.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

gnobber 0 Newbie Poster · Answer 1 · 2009-06-11T19:51:51+00:00

Hi tux4life,

Thanks for the explanation. I just want to clear some part. From the text i'm reading:

"A byte usually means an 8-bit unit of memory. Byte in this sense is the unit of measurement that describes the amount of memory in a computer, with a kilobyte equal to 1,024 bytes and a megabyte equal to 1,024 kilobytes.

However, C++ defines byte differently. The C++ byte consists of
at least enough adjacent bits to accommodate the basic character set for the implementation. That is, the number of possible values must equal or exceed the number of distinct characters. In the
United States, the basic character sets are usually the ASCII and EBCDIC sets, each of which can be accommodated by 8 bits, so the C++ byte is typically 8 bits on systems using those character sets.
However, international programming can require much larger character sets, such as Unicode, so some implementations may use a 16-bit byte or even a 32-bit byte."

I know im just confused but could you give a little light on the part regarding some byte can be 16-bit or 32-bit? Thanks!

tux4life 2,072 Postaholic · Answer 2 · 2009-06-11T19:59:44+00:00

>I know im just confused but could you give a little light on the part regarding some byte can be 16-bit or 32-bit?
International character sets just have too much possible combinations, but all those possible combinations don't fit in 1 byte (= 8 bits), so multiple bytes have to be used to make it possible to store all those different combinations (for example in Unicode)

1 byte is enough for storing 256 possible combinations (2^8), applied to a character set this means that it can hold 256 different characters (e.g: the ASCII character set)
But Unicode is an international character set, defining much more characters than 256...
Currently C++ doesn't support all Unicode characters, only the most common one...

:)

gnobber 0 Newbie Poster · Answer 3 · 2009-06-11T20:06:44+00:00

gnobber 0 Newbie Poster

16 Years Ago

Thanks tux4life!

gnobber 0 Newbie Poster · Answer 4 · 2009-06-11T21:10:36+00:00

Hi Narue,

I'm sure tux4life was just trying to help. Anyway, good thing I decided that I wasn't satisfied with what i've learn and try to open a topic again and luckily, you cleared up ALL the questions in my mind. Now I can sleep. Thanks!

EDIT: Sorry, something bugged me again. What do you mean about "ASCII is technically a 7-bit character set, so it can only hold 128 characters portably"? What do you mean about 7-bit being portable and 8-bit upper range not portable?

And you mentioned that International Character Sets have nothing to do with the size of a byte (no. of bits) so i'm assuming that to support a 32-bit unicode character using a 16-bit byte, it uses 2 bytes for the unicode character. Is that right? Please help me get this cleared up.

gnobber 0 Newbie Poster · Answer 5 · 2009-06-11T21:32:06+00:00

Sorry, something bugged me again. What do you mean about "ASCII is technically a 7-bit character set, so it can only hold 128 characters portably"? What do you mean about 7-bit being portable and 8-bit upper range not portable?

And you mentioned that International Character Sets have nothing to do with the size of a byte (no. of bits) so i'm assuming that to support a 32-bit unicode character using a 16-bit byte, it uses 2 bytes for the unicode character. Is that right? Please help me get this cleared up.

MosaicFuneral 812 Nearly a Posting Virtuoso · Answer 6 · 2009-06-12T01:41:53+00:00

>A byte is just always eight bits
Incorrect. "Byte" is an abstract term for the smallest addressable unit for your system. While more and more systems are gravitating toward the octet (eight bits, for the terminology-challenged) for a byte, it's by no means universal.

Other than hobbyist and maintenance projects, not many people will ever touch something like a 4-bit machine or any other sort of setup.

jephthah 1,888 Posting Maven · Answer 7 · 2009-06-12T02:15:59+00:00

whether it's currently practical or popular isnt the question. the question was whether C++ defines a byte as being exactly 8 bits. as several of us have learned, it does not. it can be 8 or more.

historically (1960's - 1980's), mainframe and scientific computers had a wide variety of byte sizes, from 6 to 36 bits and all sizes in between.

see Clines' C++ FAQ Lite for more info

ArkM 1,090 Postaholic · Answer 8 · 2009-06-12T02:26:02+00:00

>not many people will ever touch something like a 4-bit machine or any other sort of setup.
None the less there were supercomputers with 6-bits and 9-bits (or may be 12) bytes ;)
Moreover, I have seen a real computer with various byte widths (every 6-th byte was shorter than others)... ;)

MosaicFuneral 812 Nearly a Posting Virtuoso · Answer 9 · 2009-06-12T03:10:40+00:00

Moreover, I have seen a real computer with various byte widths (every 6-th byte was shorter than others)... ;)

What evil being designed such a thing?!

ArkM 1,090 Postaholic · Answer 10 · 2009-06-12T03:35:40+00:00

>What evil being designed such a thing?!
Probably, it was the hardware architect's acute condition ;)
Many years ago...

gnobber 0 Newbie Poster · Answer 11 · 2009-06-12T04:53:19+00:00

gnobber 0 Newbie Poster

16 Years Ago

Thanks a lot to everybody expecially Narue!

DonB 0 Newbie Poster · Answer 12 · 2009-06-12T21:06:03+00:00

Shades of ancient history! Great replies, Narue. I liked the comment about programmers who've only worked on PCs. Some of the old GE (subsequently Honeywell) 'mainframes' had a 36-bit word length which could be treated as either 4x9-bit or 6x6-bit 'bytes'!

Bits in a byte

Recommended Answers Collapse Answers

All 16 Replies

Recommended Answers