Here is my problem:
1) I have a tree which I want to act upon some branches indepedantly and simultaneously by different threads

2) The program is structured so that the processing of each branch is isolated after the main "manager" thread delegates the work to different thread: memory of branch 1 is processed by thread 1, memory of branch 2 is processed by thread 2 only. Then when all thread have completed (and go to wait/sleep mode) the main manager uses the memory of branch 1 and branch 2 to do other work

3) I am new to multiprocessors and mutlithreading. What I understand is that the volatile keyword can be quite expensive because the data won't be cached and will be written and read in memory each time. And the way my program is structured there would (theoretically!) be no need to use volatile since at any moment of the program the memory is guaranteed to be used only by a single thread at a time (either the manager thread of the branch-specific thread). For example:
a) memory in branch 1 is first used by the main manager thread,
b) then, it delegates to thread 1 which is the only one using the memory until it completes
c) then, memory of branch 1 is worked on by the main manager thread

4) Thread local storage does not seem to be the answer because the branches have numerous sub-branches and you would need to transfer back all the nodes one by one to the main manager thread (which slows everything unecessarily)

5) So what I am looking for is a way to "flush" non-volatile memory. For example, just before going to sleep branch 1 would "flush" its non-volatile memory to guarantee that its processor register and cache data would have been transmitted to the main memory. Thus the main manager thread would be guaranteed to work on up-to-date memory thereafter

6)
a) Is this possible?
b) Is there a C++ way?
d) Is this highly compiler specific?
e) Is this even microprocessor specific?
f) Are there more elegant way (and as fast) to achieve the same result?

Edited 5 Years Ago by trantran: n/a

a) Yes
b) C++ does not address this
d) Sort of (due to e)
e) Very much
f) Start here, then try to apply those ideas to your architecture (if it is not x86).

I have a problem with a) and one of the article that your link eventually lead me (via other links inside the posts) which is
http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/

Basically I think (most, if not all) people agree that:
1) volatile does not guarantee data integrity in multi-threading

But the article appears to suggest (?) that
2) volatile is *not* required to conserve data integrity in multi-threading.

But, I reread the C++ standard and it seems that it *is* required for data integrity.

Here is my argument.

The standard says:
7.1.5.1.8
"[Note: volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation. See 1.9 for detailed semantics."

and 1.9.6 says:
"The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library i/o functions"

Therefore nothing prevents a compiler to copy the non-volatile memory (let's call its location A) to *another* main memory location (let's call its location B) as part of a really smart optimisation (for example, it could have want to put frequently accessed memory close by for faster access). Now when the user put multi-processors instruction to flush the registers/cache:
1) the compiler could just ignore it when affecting non-volatile memory (and consider them only a hint - since it affects only non-volatile memory and is ok per standards that it not be flushed until variable is later access in the same thread) or
2) could flush it to the main memory in B (and not in A!)

Now how this would affect my example, I really don't know.

Edited 5 Years Ago by trantran: n/a

- you want to avoid having 2 different threads writing simultaneously to the same chunk of shared volatile memory or even in sequence this can cause problems with writes and separate reads. when you do, you need to implement some sort of mutex or semaphore. basically, if you can do like you said and assign threads to individual branches of the tree (if those branches of the tree don't overlap) you're golden.

take this for instance:
thread #1 writes n with 1
thread #2 writes n with 2
thread #2 multiplies n by 5
thread #1 adds 2 to n
thread #2 reads n
thread #1 reads n

or

n contains value 5
thread#1 writes 2 to n
thread#2 reads n
thread#2 writes 3 to n

both threads are expecting exclusive access of course, but they are not getting it, because the programmer did not plan ahead.

this can happen on hyperthreaded processor, since you essentially have 1 core with 2 threads.

- threads (at least they used to) take about 1 second to start. so they have an expensive startup profile.

- I forget what a fiber is. a thread is a function which you are executing which runs until it is finished and then the thread is over.

- monitoring the health or aliveness of your threads is entirely up to you. it would be a good idea, since you want to be sure you don't close your main() thread while the other threads are running or the OS will possibly terminate them with prejudice because they are children.
2 ideas are WaitForMultipleObjects or using a global volatile shared array or vector of variables (1 element for each thread) you continuously read with a while loop in main() (it would be a good idea to Sleep(200) inside the loop so you don't peg the CPU usage meter. init with the state "started" while simultaneously inside the threads you take note of individual position inside of the vector or array and at the end write status "done".

- google articles on multithreaded programming. lots of good ones. also there are 2 old books:
"multithreaded prrogramming with windows nt" by thuan q. pham & pankaj k garg, and
"multithreaded applications in win32: the complete guide to threads"
if you really are talking about multi-processor, that's a whole 'nother matter which doesn't involve threads, there is a UNIX function called fork() hard function to learn, read the manual on it over and over until it sinks in - basically fork forks off another process - in win32, you use CreateProcess(). mingw-w64 comes with pthreads (POSIX threads).

- multiprocessor: win32 provides functions which set processor affinity and functions to query same.

relavant win32 functions:
-------------------------
SetThreadAffinityMask http://msdn.microsoft.com/en-us/library/ms686247(v=VS.85).aspx (optional)
SetThreadIdealProcessorEx http://msdn.microsoft.com/en-us/library/dd405517(v=VS.85).aspx (optional)
ThreadProc http://msdn.microsoft.com/en-us/library/ms686736(v=VS.85).aspx (optional)
CreateThread http://msdn.microsoft.com/en-us/library/ms682453%28VS.85%29.aspx
CreateRemoteThread
CreateRemoteThreadEx http://msdn.microsoft.com/en-us/library/dd405484(v=VS.85).aspx
CreateFiberEx http://msdn.microsoft.com/en-us/library/ms682406(v=VS.85).aspx CreateSemaphoreEx http://msdn.microsoft.com/en-us/library/ms682446(v=VS.85).aspx (optional)
CreateMutexEx http://msdn.microsoft.com/en-us/library/ms682418%28v=VS.85%29.aspx (optional)
WaitForMultipleObjectsEx http://msdn.microsoft.com/en-us/library/ms687028(v=VS.85).aspx (use with CreateMutexEx or CreateSemaphoreEx, optional)
WaitForSingleObjectEx http://msdn.microsoft.com/en-us/library/ms687036(v=VS.85).aspx (use with CreateMutexEx or CreateSemaphoreEx, optional)
CloseHandle http://msdn.microsoft.com/en-us/library/ms724211%28VS.85%29.aspx
TerminateThread http://msdn.microsoft.com/en-us/library/ms686717(v=VS.85).aspx (optional)

these are called native threads (native to the OS)

Edited 5 Years Ago by jmichae3: n/a

For Jmichae3:
You are off-topic. The question debated is not general synchronisation but the specific case of volatile keyword and microprocessor in the specific example given:
- is the volatile keyword necessary in my example: no definitive answer has been given but the link I gave above (which I am just realising didn`t paste correctly says quite differently (correct link is: http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/ which is something guaranteed to make some c++ gurus on this site likely jump!)
- (as an aside, even if code is correct, if we are to believe an ACM paper of 2008: `"The “volatile errors” column in Table 1 shows that no compiler that we tested was free from defects: no compiler was able to always
create executables that produce the same access summary across
all optimization options" http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf But this is irrelevant to this thread: we are assuming here that the compiler actually is bug-free and behave as per standard)
- the way I read the c++ standard, it seems it should be necessary (but I am no expert in this!) in my specific example
- is there a way to flush microprocessor memory in a multiprocessor environment in win32? For example, SetEvent appears to flush memory (see http://msdn.microsoft.com/en-us/library/windows/desktop/ms686355%28v=VS.85%29.aspx but *only* if you use it alongside with volatile keyword)

The volatile keyword was introduced at the time when C had no notion of concurrency other than time-slicing on a single processor. The notion of volatile in C++ is the same as what it is in C.

This article by Myers and Alexandrescu is instructive: http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf

This is what they have to say about volatile:

One might hope that the above fully volatile-qualified code would be guaranteed by the Standard to work correctly in a multithreaded environment, but it may fail for two reasons.
First, the Standard’s constraints on observable behavior are only for an abstract machine defined by the Standard, and that abstract machine has no notion of multiple threads of execution. As a result, though the Standard prevents compilers from reordering reads and writes to volatile data within a thread, it imposes no constraints at all on such reorderings across threads. At least that’s how most compiler implementers interpret things. As a result, in practice, many compilers may generate thread-unsafe code from the source above. If your multithreaded code works properly with volatile and doesn’t work without, then either your C++ implementation carefully implemented volatile to work with threads (less likely), or you simply got lucky (more likely). Either case, your code is not portable.

And their conclusion:

Finally, DCLP and its problems in C++ and C exemplify the inherent difficulty in writing thread-safe code in a language with no notion of threading (or any other form of concurrency). Multithreading considerations are pervasive, because they affect the very core of code generation. As Peter Buhr pointed out, the desire to keep multithreading out of the language and tucked away in libraries is a chimera. Do that, and either (1) the libraries will end up putting constraints on the way compilers generate code (as Pthreads already does) or (2) compilers and other code-generation tools will be prohibited from performing useful optimizations even on single-threaded code.

That conclusion was accurate at the time the article was written (2004); now C++ is a thread and multiprocessor aware language.
See:http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2427.html#DiscussOrder

vijayan121:

Wow! What an illuminating post! Very grateful. In particular, the Myers article was fabulous (it also helps that he is my favorite technical author, I have always found that he has a very smooth and elegant style that is a joy to read).

Now if I piece the puzzle correctly (and correct me if I am wrong! which I may well be!) here is what I have understood and concluded:
1) I can`t do what I intended to do initially if I use pure C++99. It is impossible.
2) I could do it with C++11. (But, while I will, when it becomes available I won`t bet anything that Microsoft will make it happen anytime soon, very happy that they are to have a sort of proprietary c++ variant on vital aspect of the language (the multi-threading and multi-processor part) as explained in 3)
3) My current C++ (VS2009) allows me to do what I want, actually very easily, simply because:
a) the keyword "volatile" meaning has been changed to a proprietary-sort of meaning starting with vs2005. so that it now actually means it will flush *everything I wanted to be flushed*: any temporary variable created and stored by the compiler in registers or other special memory location that affects static or global variables, all micro-processor cache affecting global and static variable
b) the synchronisations functions (wait,SetEvent) of vs2009 also insures that this flushing occurs.

> 1) I can`t do what I intended to do initially if I use pure C++99. It is impossible.

It is impossible to do it in portable code. And you have to accept that the code generated would be largely sub-optimal.

> I could do it with C++11. (.... I won`t bet anything that Microsoft will make it happen anytime soon,...)

Perhaps sooner than you expect; I suppose they are interested in the performance of the code generated by their compiler. Converting all the libraries to exploit C++11 will take some time though.

In the interim, Anthony Williams' JustThread library is (commercially) available. http://www.stdthread.co.uk/

> My current C++ (VS2009) allows me to do what I want, actually very easily, simply because:
> a) the keyword "volatile" meaning has been changed to a proprietary-sort of meaning

Not really 'changed'. They have just added on non-standard semantics to volatile. As has every mainstream compiler writer; each one in a different way.

The Microsoft compiler treats volatile in the manner that Java (1.5+) does - close to volatile T being treated as C++11 would treat volatile std::atomic<T>

> so that it now actually means it will flush *everything I wanted to be flushed*: any temporary variable ...

I'm not too sure of that unless it has changed after 2005. This is what the Microsoft documentation used to say:

Declaring a variable as volatile prevents the compiler from reordering references to that variable relative to any other volatile variables. However, it does not prevent the reordering of references to nonvolatile variables relative to the volatile variable.

> the synchronisations functions (wait,SetEvent) of vs2009 also insures that this flushing occurs.
Yes. They insert memory barriers where required; see: http://msdn.microsoft.com/en-us/library/windows/hardware/ff552971%28v=vs.85%29.aspx. However, I'm not certain that they do the equivalent for every access to every volatile variable.


You would get a lot more mileage if you post technical questions of this kind to a newsgroup that is technical in nature; for example comp.programming.threads http://www.lambdacs.com/cpt/cpt.html. Rather than to this board which is just about ok for newbies in need of homework help (and great for social networking under the guise of programming).

"I'm not too sure of that unless it has changed after 2005. This is what the Microsoft documentation used to say"

Where did you get that quote from?

It seems to me that the entry on "Volatile (C++)" is unambiguous. It reads:

Also, when optimizing, the compiler must maintain ordering among references to volatile objects as well as references to other global objects. In particular,

A write to a volatile object (volatile write) has Release semantics; a reference to a global or static object that occurs before a write to a volatile object in the instruction sequence will occur before that volatile write in the compiled binary.

A read of a volatile object (volatile read) has Acquire semantics; a reference to a global or static object that occurs after a read of volatile memory in the instruction sequence will occur after that volatile read in the compiled binary.

(source: http://msdn.microsoft.com/en-us/library/12a04hfd%28v=vs.80%29.aspx)

My unexpert comprehension of this is that any global or static object (non-volatile or volatile) will be flushed before a write on any volatile object. I am correct in thinking this?

> Where did you get that quote from?

A Microsoft Knowlege Base article, circa 2005

> any global or static object (non-volatile or volatile) will be flushed before a write on any volatile object.
> I am correct in thinking this?

Yes, the documentation is pretty clear - in Microsoft C++ volatile enforces acquire and release ordering for all volatile variables as well as all non-volatile variables with a static storage duration.

This is usually all that you require. However, there are situations where sequentially consistent ordering is necessary.
See: http://www.justsoftwaresolutions.co.uk/threading/memory_models_and_synchronization.html

This question has already been answered. Start a new discussion instead.