I am damn confused between these two terms. Can anyone explain this using some example? Thanks in advance.

They are totally different animals! Thread-safe means that multiple threads in one executable image can run the same functions without interfering with each other. Re-entrant means that you can have a function that can be recursively entered even before the previous invocation has finished. This is common for recursive algorithms such as computing fibonacci sequences. This does not require a threaded environment.

Edited 1 Year Ago by rubberman

Well, as rubberman said, reentrancy is a more theoretical notion about formal analysis, but I would also add that in terms of multi-threading, reentrancy provides a very strong guarantee that this will work as expected when invoked concurrently (simultaneous threads that execute the same function).

You have to understand that concurrent programming (or "multi-threading") is all about the data that is shared between the concurrent threads, that is, the (mutable) data that multiple threads need access to more or less at the same time (or in some specific sequence). We call that "shared state".

By definition, a reentrant function does not have any state (mutable data) outside of itself (i.e., it can have local variables, but it doesn't access any mutable variables of wider scope, such as global variables), it's basically the first (and main) rule for a function to be reentrant. Since it does not have any state, it certainly does not have any shared state. And this is what makes it nearly fool-proof as far as using it concurrently.

But note that there is also a weaker definition reentrancy that only requires that the function does not change the value of any global data, by the end of it (e.g., it could momentarily change some global data and then restore it before exiting). But this is a weak definition that is only meaningfully "reentrant" in a single-threaded environment (if at all), and is therefore not what people usually mean when discussing reentrancy in the context of multi-threading.

The concept of "thread-safe" is usually meant in a much weaker sense. It is usually meant to say that all the accesses to the shared state are controlled or otherwise guarded to ensure that they occur correctly (which is called "serialized access"). For example, you could protect each read or write access to the shared data with a mutex (mutual exclusion) to guarantee that you don't have one thread reading it while another thread is in the middle of re-writing it. This is weaker because the only thing that it guarantees is that the data accesses themselves are not going to give you junk values or lead to corrupt values. It does not, in general, include things like guaranteeing the overall behavior of the function in a concurrent setting, i.e., you can still suffer from problems of race conditions and deadlocks, which is undesirable behaviors but perfectly "thread-safe". In that sense, reentrancy (in the stronger sense) is a much stronger property because it guarantees both safety and correctness.

But note also that thread-safety can sometimes be specified at a higher-level, i.e., in terms of functionality. For example, the C++ standard shared-pointer is guaranteed to correctly keep track of its internal reference counter during concurrent operations, and in that sense, is called "thread-safe".

Now, there are cases of reentrant functions that are not thread-safe. Very often, simple reentrant functions are thread-safe, but you still have to treat thread-safety as an additional requirement or property to check for, and potentially enforce with additional code. The most common problem is that a reentrant function (say "A") might read some global data that could be written to concurrently from some other function (say "B"), meaning that "A" is thread-safe against itself (multiple instances of "A" could run concurrently) but not thread-safe against "B" (a concurrent execution of "B" could mess up the execution of "A"). Another possible problem is that of false sharing. But, usually, people talk about reentrant functions only when it is trivial to see that they are reentrant, such as having no accesses of any kind to global data (it only has by-value parameters, local variables and a return value, and no other data is used at all).

So, in summary, you have these rules:

  • If it's clearly reentrant, then it's probably thread-safe, but you still have to check.
  • If it's thread-safe, then it's not necessarily reentrant, and you would have to check for it.
  • If it's thread-safe but not reentrant, then you have to worry about typical concurrency issues such as race conditions and deadlocks.
  • If it's both reentrant and thread-safe, then you are pretty much guaranteed that you won't have any problems with this code.
  • And in either case, if you think you can ever be totally safe when writing a concurrent program, you're a fool!

Edited 1 Year Ago by mike_2000_17: fixed formatting

Comments
Excellent response Mike!

@Mike, Thanks for this great answer. Really. Can you please give one example to tell that it is thread-safe but not reentrant, which is not thread-safe but reentrant? I am clear after your post but want to make it clearer with a simple example. Please mike. Thanks a lot in advance.

Edited 1 Year Ago by nitin1

thread-safe but not reentrant

Here is a simple example (in C++, cause I'm not fluent enough in C for this):

#include <atomic>

std::atomic<int> value{42};

void foo() {
  value.store( value.load() + value.load() );
}

In this case, because foo does all it's shared data (the "value" global variable) accesses, read and write, via the atomic operations, load and store, it is perfectly thread-safe in the sense that neither loads or stores could occur while some other thread is in the middle of loading or storing it too. However, if you interrupt this function after the first load operation and before the second load operation, then, the possibility that the global value undergoes some sort of change in the meantime (by, for example, calling foo again) makes the function non-reentrant because it's outcome would be different from (and inconsistent with) having executed it entirely the first time around.

But it's all a matter of expected behavior. Generally speaking, the expected behavior of the foo function here is that after foo returns, "value" ends up having twice the value that is had when foo was entered. And in that sense, it is clearly not reentrant. One could also specify the thread-safety in terms of that expected behavior and thus, say that this function is not thread-safe because it cannot guarantee this behavior. But a more basic thread-safety definition would say that any imaginable sequences of multi-threaded executions of this code (with repeated and concurrent calls to foo) has a well-defined outcome (which is not necessarily the "desired" outcome, but a well-defined outcome nonetheless).

not thread-safe but reentrant

Here is a example of that:

int value = 42;

int foo() {
  return value * 2;
};

The foo function is reentrant because it can be interrupt and resumed, and called however many times you want in between and will always yield results that are consistent (twice the value). However, if there is any other function, executing on another thread, that modifies the "value" variable (and without assuming that the platform implements atomic operations on integers by default), then the result of foo would be undefined because the value could be corrupted by any concurrent changes to it, in other words, the outcome of this function in such a multi-threaded environment is undefined. For example, if you make this simple modification:

const int value = 42;

int foo() {
  return value * 2;
};

then the foo function becomes thread-safe, because there is no allowable (well-defined) way modify the "value" variable.

So, it basically boils down to what rubberman said, that reentrance is really more of a single-threaded formal property that is largely independent from (but often correlated with) thread-safety.

@mike Thanks for such a great answer. But, I want to ask one question in "thread-safe but not reentrant" example , that when it is not giving me correct behaviour in multi-threaded environment, then how you can say that it is still thread-safe?

Please explain when you said "has a well-defined outcome (which is not necessarily the "desired" outcome, but a well-defined outcome nonetheless)." in your last post. I mean if it is not gviging correct answer, then how it is still thread-safe?

Thirdly, you said, reentrant are functions when they're not using global variables and all. but in all your example, you make functions which are using global variables things. Please clarify once. Thanks in advance Mike.

Edited 1 Year Ago by nitin1

if it is not gviging correct answer, then how it is still thread-safe?

That's related to what I pointed out about things being relative to how you define your requirements. In other words, you can adopt a "low-level" view of thread-safety and only consider that individual accesses to shared state are protected such that they are not corrupt. Or, you can adopt a more "high-level" view and specify an overall behavior that should be observed even in multi-threaded environments. The point is that this is all very subtle and multi-layered, and you always have to be careful about what the "thread-safe" label means for the specific context in which the term is being thrown around.

reentrant are functions when they're not using global variables and all. but in all your example, you make functions which are using global variables things.

I said that that reentrant functions don't have a global state, which does not forbid the use of global variables. It only forbids the use of global mutable data (state) that is changed by the function. So, read-only access to a global (const-)variable pretty much makes the global variable a parameter of the function, not a state. That's an important distinction in practice.

Comments
As usual, Mike2K gives good advice.

SO, accessing non-constant global variables makes the function non-rentrant? Can I say that? If yes, then in your second example, it is accessing a variable which is global but non-const. Please clarify this also.

Secondly, malloc() is a function in C. How will you explain re-entrancy and thread-safe using malloc()? Can malloc be thread-safe or re-entrant? Actually, I want to know how malloc() will react when ut would be called by many threads at the more or less same time?

accessing non-constant global variables makes the function non-rentrant?

No. It is not a matter of whether the global variables are marked as "const" or not (in fact, some languages don't have that concept at all). It is a matter of what the function does with the global variable, if it only reads it ("read-only access"), then it can be reentrant, but if it reads and writes to it, then it cannot be reentrant (under the more strict definitions).

Think about it this way. If you interrupt a function at any arbitrary point, what is all the information (value of variables) you need to gather in order to be able to restart the function from the point where it was interrupted? Some data participate to the computations in the function, but remain unchanged throughout the execution of the function, these are called "parameters" of the function. Some data are modified throughout the execution of the function as it progresses (e.g., loop counter, summation value, intermediate values of calculations, etc.), and these are called "states" of the function. The parameters characterize the function call and determine what you should expect as a result of calling it. The states characterize the progress-point of a function's execution. So, if you interrupted a function execution at some point and recorded all of its states, then you could restart / resume that function execution from the same point later on by simply reloading the state. A reentrant function is one which only has local states (i.e., the state / progress of the function execution is entirely described by the value of its local variables, and only its local variables), but it does not exclude global parameters (that contribute to the computations, but do not change over the execution of the function).

This interrupt / restart procedure is essentially what an OS task-scheduler does when it switches between processes and threads, but it assumes that all the states are local variables of the function. This works perfectly when you have reentrant functions (and a single core), but it can be a bit dangerous when parts of the state are global (i.e., when the function is not reentrant), because those cannot be recorded and restituted, and therefore, must be assumed not to change in the time interval between the interruption and the restart of the function execution, because that would cause the function to be resumed in a "broken" state.

Making global constants marked with "const" is simply a way to get more reassurance that constants remain constant by having the compiler enforce that rule throughout the code. However, it has no significance whatsoever when it comes to the concept of reentrant functions or the distinction between states and parameters.

How will you explain re-entrancy and thread-safe using malloc()? Can malloc be thread-safe or re-entrant?

Malloc can be (and is) thread-safe. For example, a bad (unsafe) implementation of malloc could lead to the situation where two threads allocate memory at the same time and malloc ends up returning a pointer to the same memory for both threads, which would obviously lead to big problems. This is guaranteed never to happen, and that's the thread-safety guarantee provided by malloc (if implemented correctly).

Malloc is not re-entrant, strictly-speaking. The malloc function relies on the heap, which is a data-structure that manages the allocation of memory by doing some book-keeping about what parts of the memory are allocated already and what parts are free. When malloc is called, the state of the heap (which is a global entity) is changed (i.e., a block of memory that was marked as "free" before the call is now marked as "allocated" after the call). This means that every call to malloc changes the global state of the program. So, repeated calls the malloc cause more and more memory to be marked as "allocated", and every call returns a different memory address too (allocates a different chunk of memory). Also, malloc could fail if you run out of memory, and that depends on how many times you've called it before (and how much memory you've requested during those calls). Therefore, there is really no way that you could strictly say that malloc is re-entrant (which also means that any function that calls malloc is not re-entrant either).

You could, however, weaken the definition of re-entrancy by making some assumptions about the observable behavior in such a way that malloc could be considered re-entrant. First, you could assume that you will never run out of memory (which is usually true in practice, for most programs). Second, you could say that the actual memory address returned by malloc is not relevant, and that the only meaningful observable behavior is the size of the memory block it points to. Third, you could say that the state of the heap is an implementation detail that is not observable or meaningful to the execution of your program. Under those assumptions and definitions of the observable behavior, the malloc function behaves exactly the same on every call, regardless of the history of previous calls to it, and could, in that sense, be considered re-entrant for all practical purposes. For example, a programming language like Java, where all things are heap-allocated, effectively makes those assumptions as part of its fundamental execution model, and in that context, functions that make dynamic memory allocations (which are pretty much all functions in Java) can still be considered re-entrant, for all practical purposes.

This "for all practical purposes" is an important concept in computer science. Theoretical computer science concepts like re-entrancy are always defined on top of some model of how a computer works (how instructions are executed, how memory is accessed and structured, etc..). There are many ways to define these models from simple to complex, from low-level to high-level, from realistic to abstract, and so on. A function like malloc is not re-entrant under a very direct low-level realistic model of actual computer systems (e.g., the model assumed by low-level languages like C/C++), but can be considered re-entrant under some higher-level abstract models (e.g., the model assumed by high-level languages like Java / C#). At the end of the day, the real low-level details of the actual computers are unavoidable, because that's what the programs run on, but you can make a lot of useful theoretical arguments when assuming a higher-level model. In this particular case, I would consider malloc to be re-entrant, as far as analysis is concerned because those assumptions that I mentioned (which define a higher-level model for the memory of a computer) are very reasonable in practice.

Actually, I want to know how malloc() will react when ut would be called by many threads at the more or less same time?

If multiple threads call malloc at the same time, the allocations will be serialized (or maybe more fancy schemes are used, depending on the implementation). This means that whichever thread called malloc first will cause any other thread to have to wait until that call returns before they get their turn to allocate memory. In other words, they each wait for their turn. A fancier implementation could manage to avoid the waiting time by some fancy technique, but the point remains that this is guaranteed to be thread-safe, as I said earlier.

Edited 1 Year Ago by mike_2000_17: typo

This article has been dead for over six months. Start a new discussion instead.