I have just started developing and writing linux kernel modules and device drivers. I am having a lot of trouble understanding new data types that I haven't encountered while writing application programs. For instance, I am having difficulty understanding what is the purpose of size_t, ssize_t, loff_t etc? Why use these types instead of the usual data types? When were these new types introduced? Where can I understand all of these new data types?? Thanks in advance!!

These are "abstract" types that express certain types that are compatible with the architecture of the CPU you are running on. For example, size_t is the size of an unsigned integer, a 64-bit value on 64-bit systems, and a 32-bit value on 32-bit systems. An ssize_t is a signed value, the same size (bit-wise) as size_t, but signed. loff_t is a long offset used for very big files in order to seek beyond a 32-bit boundary (4GB). It should be a 64-bit value, so on 32-bit systems it would probably be a long long type, and on 64-bit systems a long type.

Many C functions, such as calloc, malloc, realloc, memset, memcpy, etc that allocate or manipulate memory, use a size_t value to specify how big a value you are allocating, setting, copying, etc. This allows those functions to handle anything up to the maximum size that the operating system will support in memory.

Edited 1 Year Ago by rubberman

If you are going to be hacking away at the Linux kernel, you have to be prepared to deal with tons of specific data types. And that's not even a tiny fraction of all the "exotic" stuff you'll see in there. Kernel coding is a dark art, and lots of shadow monsters lurk in the darker corners of the Linux kernel.

Why use these types instead of the usual data types?

Well, for one, the size_t is a very usual data type, it's pretty much the C standard type for an unsigned integer that is guaranteed to be of the same size as a pointer, which is obviously very useful.

But generally-speaking, there are many reasons for using different names for integer types (which are often just typedef names for one or another built-in integer type). First of all, there are times when you should use the most "native" integer types (e.g., those that are best for the target instruction set, can represent addresses, or can be packed optimally in registers, etc..), and there are also times when you need to use integers with a fixed number of bits regardless of the target platforms. Remember, kernel coding involves a lot of fiddling with bits and tightly packed binary data structures, which is the kind of stuff where you must choose your types very carefully.

Another reason for this is that it is often very important, for optimizing performance, to carefully tune your data structures, including the size or layout of every member of them. And often, in the optimization process, you need to test out different options, and by using typedefs instead of built-in types allows you to change it in one place only.

When were these new types introduced?

Except for those that are standard (that you might just not have seen before), most of those types were introduced to the Linux kernel when the guy who needed it created it, it's that simple, most of them are user-defined.

Where can I understand all of these new data types??

Usually, they are usually named in such a way that you should be able to guess what they are or what they are for (like loff_t, which is clearly a "long" integer used to store an "offset"). And usually, you don't really need to know where the types are defined or exactly what they are, because you just use them for what they seem to be needed for (if you see them appearing in function declarations and stuff) and don't use them for any other purposes. If you need to know their size, that's what the sizeof operator is for. And if you really do need to know what they are, just look for a typedef with that name (it's usually a good thing to have some kind of an IDE or smart text editor to find that typedef automatically or quickly enough).

At the end of the day, this all boils down to taking the time to familiarize yourself with the code. Every code-base has its own little culture and conventions, and it just takes a lot of getting used to it by digging deep into the code. The Linux kernel is probably one of the most challenging and overwhelming code-bases to throw yourself into. I've personally only ever scratched the surface of it, by necessity (for an ARM board, compiling an upstream version of the kernel and applying a few patches and bits of my own hackish code into it), and I hope I won't have to dig into it again anytime soon, and that's speaking from a decade of experience in C/C++ programming. Like I said, kernel coding is a dark art, and it takes a special kind of person to do that, not me.

This article has been dead for over six months. Start a new discussion instead.