3 Years
Discussion Span
Last Post by mike_2000_17

You may find this page at OSdev.org enlightening on the subject. Basically, the main reason Java in its usual form is problematic for systems programming is that it compiles to a bytecode rather than to native code. While this by itself is not a showstopper - it is possible to compile Java natively, or convert the bytecode to native code (this is what the JIT compiler does, in fact) - it makes it less desireable as a systems language.

There are other reasons, though. It is a garbage collected language, which means that the memory management has to be be in place and running before any Java programs can be run. It does not have programmer accessible pointers, which makes accessing memory-mapped hardware problematic at best. It has in-language threading, which means that the process management and scheduling needs to be in place ahead of time as well. Overall, it is a language with a high overhead.

Mind you, even C++ isn't particularly good for systems programming; there are reasons beyond just Linus' dislike of C++ that C remains the predominant systems language. C++ exception handling and other features are overhead that require a lot of systme support as well.

All in all, either Java or C++ can be used for systems programming, but it would require a lot of prep work to get the system ready for either one, work that would have to be done ain assembly or a true systems programming language like C.


First of all, the "systems programming" term is a bit general and refers to many areas, which usually have the following characteristics: (1) high performance requirements, (2) low overhead (memory, latency, etc.), and (3) involve building a complex software infrastructure. Examples of this might include OS kernels (or device drivers), compilers, database servers, virtual machines, computer games, etc..

Obviously, stuff that is more low-level (kernels, drivers, embedded software) also requires more low-level facilities, like being able to manipulate alignment, change individual bits, arrange precise memory layouts, and arbitrary aliasing (pointers, and unchecked type-casting). This, alone, rules out most of the "safe" languages, whether you can compile them to native code or not. For example, in Java, even compiled to native code, you cannot do any of these things, at all, so, it's ruled out.

Most "high-level" languages are essentially categorized as "high-level" specifically because they don't allow you to do any of these "low-level" things. So, that's a pretty important dividing line. And that's why, for low-level applications, the list of reasonable candidate languages is pretty short, mostly has Assembly, C, C++, D, Ada, and maybe a few others. Traditionally, people also classify languages as "low-level" in reference to their lower level of abstraction (conceptually moving away from the machine), but I find that classification rather pointless because it doesn't convey the real reason why low-level languages are used for low-level tasks, not because they lack abstraction, but because they grant low-level access (e.g., C++ does not lack abstractions, but grants low-level access, and can therefore be considered a viable low-level language).

When you talk about systems programming at a slightly higher level, i.e., when direct hardware / memory access is not strictly necessary, then the argument becomes more about the three points that I enumerated earlier. The requirements for performance and overhead have not really gone away (as "promised") simply because what we want to do with software has evolved faster than the hardware we need to run that software on, and because, even as poor performing software can just run on bigger hardware, someone still has to pay the bills for that hardware / electricity / cooling / etc..

So, the situation is this, because the task is very complex (e.g., database server, financial predictions, computer games, etc.), you are going to need a good amount of abstraction capabilities in order to build and to keep it manageable (organized), but at the same time, throughout all those layers of abstraction, every bit of overhead and performance issues matter a lot as they amount to a lot of overhead and significant performance problems.

Again, very few languages fit that bill. A lot of the "safe" languages have some serious overhead that is directly designed into the language itself, such as garbage collection and reference semantics. This is another clear dividing line, it is virtually impossible to write lean and mean infrastructure code in a language with reference semantics (or, at least, static or JIT analysers are currently not anywhere near smart enough to make it possible). Garbage collection is also problematic in many context. Sometimes, it can be better in the overall performance (because of deleting memory in "batches" instead of piece by piece), but more often than not, it represents undue overhead in memory and especially in latency, as well as a performance penalty.

So, whenever you talk about writing performance-critical algorithms or data-structures, you almost inevitably have to look at a language like C / C++ / D. High-level languages, i.e., with reference-semantics, garbage collection, and dynamic typing systems, are very nice when it comes to plugging all the main pieces together (filling containers, invoking algorithms, etc.) and writing simple bits of code in between. But you have to be a bit of a fool to write anything more "serious" than that in such languages, because things like memory layouts, indirections, and cache-optimization, are things that matter significantly in those areas, and high-level simply not adequate to tackle those issues.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.