How do you handle the slow execution speed of STL during development?

Question

kungle 36 Newbie Poster

9 Years Ago

If I (have to) use C/C++, I'm mostly working on time-critical or real-time applications. Examples are OpenGL texture Streaming (Example: Streaming Satalite Data on planet surface in realtime: https://www.youtube.com/watch?v=ws2ra5MvDi4) or real-time PSM audio maonpulation (Example: Automatic tuning a guitar to a C64= SID tremolo: http://www.yousry.de/audio-example-real-time-tune-detection-tuning-auto-tuning/).

During development I'm using "-O0 -g" as compiler options to enable dwarf debugging information.
But as result, STL becomes unusable slow (missing loop unroling, vec optimizations etc.)
In these cases I write code like:

#ifndef DEBUG
        // deque too slow
        std::deque<float> inFIFO;
        std::deque<float> outFIFO;
#else
        float* inFIFO;
        float* outFIFO;
#endif

to avoid later discussions "Why I still use shi##y C and not that wonderfull C++"

This otherwise results in ugly, badly readable source-code. Does an workaround exists I'm not aware of, to efficently use the STL in an development environment?

c++ performance stl

5 Contributors
10 Replies
587 Views
5 Days Discussion Span
Latest Post 9 Years Ago Latest Post by Maritimo

All 10 Replies

NathanOliver 429 Veteran Poster

9 Years Ago

-O0 will give you zero optimization and can't remember where I was told this since it was a few years ago but that will definitely cause your code to be very slow. I would recommend using -Og which is:

Optimize debugging experience. -Og enables optimizations that do not interfere with debugging. It should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience.

That should give you the best of both worlds.

source: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Maritimo 15 Junior Poster in Training

9 Years Ago

Probably you don't need to go with the debuger to all of your functions. In that case, I also sujest you to compile some files with -O3 (where you are sure that everything is OK) and others (where you will need to debug) with the -Og option, following the NathanOliver advice.
To do this, you can extract some parts of your code into functions that you put on separated files that you compile with -O3. I guess you don't need to have the power to instect into the debuger the std::deque, for example.

Edited 9 Years Ago by Maritimo

vijayan121 1,152 Posting Virtuoso

9 Years Ago

GNU specific, and worth studying merely for the elegance of the design.
(Note: it violates ODR, but the implementation is allowed to provide well-defined semantics for undefined behaviour.)

<begion quote>

The following goals directed the design of the libstdc++ debug mode:

Correctness: <...>
Performance: <...>
Usability: <...>

Minimize recompilation: While it is expected that users recompile at least part of their program to use debug mode, the amount of recompilation affects the detect-compile-debug turnaround time. <...>
There are several levels of conformance to this requirement, each with its own usability and implementation characteristics. In general, the higher-numbered conformance levels are more usable (i.e., require less recompilation) but are more complicated to implement than the lower-numbered conformance levels.

Full recompilation: <...>
Full user recompilation: <...>
Partial recompilation: <...>

4. Per-use recompilation: The user must recompile the parts of his or her application and the C++ libraries it depends on where debugging should occur, and any other code that interacts with those containers. This means that a set of translation units that accesses a particular standard container instance may either be compiled in release mode (no checking) or debug mode (full checking), but must all be compiled in the same way; a translation unit that does not see that standard container instance need not be recompiled. This also means that a translation unit A that contains a particular instantiation (say, std::vector<int>) compiled in release mode can be linked against a translation unit B that contains the same instantiation compiled in debug mode (a feature not present with partial recompilation). While this behavior is technically a violation of the One Definition Rule, this ability tends to be very important in practice. The libstdc++ debug mode supports this level of recompilation.

(5.) Per-unit recompilation: <...>

<...>

The Wrapper Model

The libstdc++ debug mode uses a wrapper model where the debugging versions of library components (e.g., iterators and containers) form a layer on top of the release versions of the library components. <...> This design decision ensures that we cannot regress release-mode performance (because the release-mode containers are left untouched) and partially enables mixing debug and release code at link time <...>

Release- and debug-mode coexistence

The libstdc++ debug mode is the first debug mode we know of that is able to provide the "Per-use recompilation" (4) guarantee, that allows release-compiled and debug-compiled code to be linked and executed together without causing unpredictable behavior. This guarantee minimizes the recompilation that users are required to perform, shortening the detect-compile-debug bug hunting cycle and making the debug mode easier to incorporate into development environments by minimizing dependencies.

Achieving link- and run-time coexistence is not a trivial implementation task. To achieve this goal we required a small extension to the GNU C++ compiler (since incorporated into the C++11 language specification, described in the GCC Manual for the C++ language as namespace association), and a complex organization of debug- and release-modes. The end result is that we have achieved per-use recompilation but have had to give up some checking of the std::basic_string class template (namely, safe iterators).

<...>

Compile-time coexistence of release- and debug-mode components

Both the release-mode components and the debug-mode components need to exist within a single translation unit so that the debug versions can wrap the release versions. However, only one of these components should be user-visible at any particular time with the standard name, e.g., std::list.

<...>

<...> For this reason we cannot easily provide safe iterators for the std::basic_string class template <...> With the design of libstdc++ debug mode, we cannot effectively hide the differences between debug and release-mode strings from the user. <...> The effect on users is expected to be minimal, as there are simple alternatives (e.g., __gnu_debug::basic_string), and the usability benefit we gain from the ability to mix debug- and release-compiled translation units is enormous.

<end quote>
extracts from: https://gcc.gnu.org/onlinedocs/libstdc++/manual/debug_mode_design.html

Edited 9 Years Ago by vijayan121

mike_2000_17 commented: That's cool! +14

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 1 · 2014-12-19T18:26:38+00:00

I also sujest you to compile some files with -O3 and others with the -Og option

That could be problematic, especially for the STL. Since most of the STL is header-only libraries (templates), the parts compiled without debugging and the parts compiled with debugging will end up compiling a debug-version and a release-version of the STL components, respectively. These are not, in general, binary compatible. What this means is that if you have a function that takes a parameter of a type like std::deque<foo>&, and the function itself is compiled with one set of options and the code that calls the function is compiled with the other set of options, then the function will interpret the memory that this reference parameter points to differently from how the calling context created it. In other words, their memory layouts will be different, and in general, incompatible, leading to pretty nasty errors.

So, unless you are really careful, this is not a very practical thing to do. And at the end of the day, you won't really be able to optimize the debug-version performance of the STL components because you still have to use the debug versions of it.

Does an workaround exists I'm not aware of, to efficently use the STL in an development environment?

Besides telling the compiler to try to optimize as much as it can without affecting the debug information, using the -Og option, as NathanOliver said, I don't think there is much else that you can do, without getting into dirty hacks (like defines and pragmas surrounding the includes of STL headers, to disable debugging and enable optimizations for just the STL components).

One hypothetical option that doesn't exist yet, but might be a good idea, would be to disable debugging options for system headers (STL, standard headers, or any headers from include-directories that are marked as isystem). I don't see why that wouldn't be possible. But I don't think that any compiler does this currently, but I (or something) might want to bring up that idea with the Clang people.

Personally, I very rarely use debug builds. Usually, if you test often, maintain unit-test suites, and develop things incrementally, each time building confidence in the code you've developed before, then whenever there is a bug, it is usually confined to a very small area of the code and it is fairly easy to simply add print-outs in the code (or using some debug-print macro, that does not rely on debug compilation options) to print out the values of variables to see where things are going wrong. This way, in the last several years, the number of times that I have had a bug that was difficult enough to find that it required a debug build of the code can be counted on the fingers of one hand.

Maritimo 15 Junior Poster in Training · Answer 2 · 2014-12-20T04:25:57+00:00

Sorry, I don't agree with mike. It is perfectly posible to create a library to work with std::deque<foo>& compiled with -O3 and work with other code compiled with -Og. You don't need to have the all the libraries compiled with every combinations of flags to link with your program. For example, you can use sqrt, malloc and printf compiled with different flags and different compiler on the c library into your new program. The linker do not specify that all the functions must be compiled neither with the same optimization flags nor the same compiler. Even, it is possible to link functions compiled in different languages.

So the idea is to create a "library" to work with std::deque<foo>, test and debug it and then compile it with -O3. Later link this verified library with your code compiled with -Og. Please note that calling those optimized files a "library" is just a name; you really need the *.o compiled with -O3 files to link with. But if you really want you can create a static library or even a dynamic library to work with std::deque<foo>.

Obviously, if later you change you foo class, you will need to recompile your library. If this is a problem, you can also create a library to work with std::deque<foo*> that will not have this dependency problem. Also you can consider to create a precompiled full optimized library to work with std::deque<void*> and later use "translation" functions from foo* to void* compiled with -Og.

Finally, when you are debuging your code, you must "skip" all the calls to your library, in the same way that you "skip" the calls to sqrt, for example.

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 3 · 2014-12-20T07:40:39+00:00

Sorry, I don't agree with mike.

It is brave to disagree with me on such technical matters. Unfortunately for you, this is not a matter of opinion, it is fact, and so, be prepared for the scolding that follow...

It is perfectly posible to create a library to work with std::deque<foo>& compiled with -O3 and work with other code compiled with -Og.

Possible, yes, but very problematic, which is what I said. It's playing with fire beyond what I could sanction, let alone advice people to do, not to mention that it's a lot of pain for little gain.

The typical trick to make this kind of thing possible is to limit the interface between the two libraries (or parts of the code) to be purely a set of C functions involving only primitive types, opaque pointers and some specially-designed objects (this is where it gets tricky, and most people choose to limit them to POD class types, to stay on the safe side). However, on your typical code-base, retro-fitting a C++ library to create this kind of a compilation firewall requires quite a bit of effort, and it's not worth it if your sole purpose is debugging (not to mention that this kind of library overhaul could easily create additional bugs!).

Otherwise, if you don't do this kind of insulation between the optimized and the debug parts of your library, you can run into some pretty nasty problems. Assuming a situation where you have a function void foo(deque<int>&) that you compile for release (with optimizations) and you call that function from some debug-enabled code, then the reference to the deque object will be referring to a debug-enabled instance of it and the foo function will use it as if it was a release-ready / optimized instance of it.

Now, it is possible that certain implementations of the STL do not add any data members to the container class (or otherwise change its binary layout). However, I don't think that this is very common. For example, a quick check in the GNU implementation of the STL (libstdc++) that comes with GCC or MinGW will reveal that when debugging is enabled, the deque class has two additional pointers and one additional integer as data members, and it gets those through an additional base class (uses multiple inheritance only when debugging) which also has a non-trivial destructor, not to mention that iterators of the deque class are also different when debugging, most notably, they contain back pointers to their parent container. This is called "debug instrumentation" and is a very common thing, especially in standard libraries.

So, if you were to call the foo function from some debug-enabled code, you would be giving a reference to a deque object that contains all this debug instrumentation that the code in foo would never expect to be there. This would lead to significant issues. Basically, anything that the foo code might try to do with the deque object is likely to result in disaster in a number of different ways, including heap corruptions, memory corruptions and segmentation faults (access violations).

On top of all that, some standard library implementations, most notably Microsoft's version (which is adapted from Dinkumware's implementation), will use an instrumented heap when debugging (this is so that the debugger, like Visual Studio's debugger, can track all memory allocations). I hope I don't have to tell you want happens if you allocate memory from one heap and deallocate it from another! It's complete mayhem.

You don't need to have the all the libraries compiled with every combinations of flags to link with your program.

Not every flag needs to be the same. But some flags matter more than others. And debug vs. release is one of those flags that matter a lot when it comes to binary compatibility. In fact, it's the one that matters the most, for all the reasons I just explained.

For example, you can use sqrt, malloc and printf compiled with different flags and different compiler on the c library into your new program.

All the functions you listed are C functions. C functions and C++ classes/functions are hugely different. C has standard binary compatibility guarantees that C++ compilers do not provide at all. This is mostly because C is much simpler and can afford it. Simply put, to comply with standards of C and of the platform, the compiler has no choice but to comply to a clear set of rules for layout structs and calling functions. This is why it is always safe to fall back to a C interface between two libraries (as I explained in the beginning of this post, about how to isolate to library with a sort of compilation firewall, that firewall is the binary compatibility that C guarantees).

In general, this is not true for anything in C++. However, you can make some assumptions, the first of which is that any standard function inherited from C is safe. For the other assumptions, they get a lot more technical and I don't want to get too deep into this. Also, this is the kind of thing for which you need to read through your compiler's documentation, and at the end of the day, creates very brittle and platform-specific code.

The linker do not specify that all the functions must be compiled neither with the same optimization flags nor the same compiler.

Well, that's not the question, whether the linker accepts it or not does not mean that it will work correctly. Believe me, there is a lot of very wrong things that a linker will accept, including linking together two binary incompatible libraries (or object files, or whatever).

if later you change you foo class, you will need to recompile your library. If this is a problem, you can also create a library to work with std::deque<foo*> that will not have this dependency problem.

Man... you are just piling on bad advice over bad advice here. If you changed the foo class is some way, you have to recompile the library that uses it. The library that you have which is working with std::deque<foo*> will have, presumably, to do something with foo object that are pointed to by those pointers (otherwise, what's the point of passing them to the library). If it hasn't been recompiled with the new version of the foo class, that library will act as if it was still using the old version (thinking it has the data members that the old version had and the functions the old version had), which will, again, have disastrous effects.

If you need to solve the dependency problem to remove the need for complete recompilations whenever you change something in foo, then you should use an idiom like the Cheshire Cat.

you can consider to create a precompiled full optimized library to work with std::deque<void*>

How could it be fully optimized if it contains void* elements? I'm guessing you don't know much about optimization, do you? The two most important factors of optimization are memory locality and code locality, and that approach (called "C-style type-erasure") makes both of those impossible. Have you ever wondered why C++'s std::sort is faster than C's qsort?

It is true, though, that using this approach would probably be faster than using the debug-enabled version of deque. But it will certainly not be acceptable for production. So, you would still have to write two separate versions of the code, which is a lot of hassle.

Maritimo 15 Junior Poster in Training · Answer 4 · 2014-12-20T09:20:45+00:00

Dear Mike, your answer is tricky. You said a lot of things that are totally true but it is not what I am proposing.

Assuming a situation where you have a function void foo(deque<int>&) that you compile for release...

Obviously you can not create an optimized version of a function like foo(deque<int>&) because that implies you are sending a reference of a deque --that change with the optimization flags--. That is the trick, to have some parts of the code with deque optimized and others with deque not optimized; and worst send references from one to the other. What you must do is to create a function foo(int) that internally "saves" the int into a deque.

In agreement with the problem set originally, what it is intended to be "saved" into the deque are floats:

#ifndef DEBUG
        // deque too slow
        std::deque<float> inFIFO;
        std::deque<float> outFIFO;
#else
        float* inFIFO;
        float* outFIFO;
#endif

So, it is totally possible to create a library compiled with -O3 that push and pull floats using internally a deque. The interface of this queue library will only transmit floats that do not change with the optimization flags.

I had suggested to create a library to store void* only as a general alternative to a queue<foo>, that as you said is not a perfect solution but is faster that the not optimized version.

But this is not the case here, the original problem is to create a queue for floats. Sorry that I went into a more general solutions with its own new problems. I did this because you formulated a general problem of a queue of any object. Obviously I made the hipothesis that your foo object do not change with the optimization flag. Well, you got me making to resolve a totally general problem that is not the original.

In the following please do not diverge from the original problem that is to save floats into a queue. For this problem, what I am suggesting is totally possible. Construct an optimized version of deque that push and pull floats and link this with not optimized code. Diverging a little, if the original problem would be to save POD objects, this solution will also work.

It is brave to disagree with me on such technical matters... so, be prepared for the scolding that follow...

Finally I must admit that I appreciate to disagree with you because I think this discussion is very informative. For this reason, I will take the risk of more scolding from you in the parts that I am wrong. But I hope you will also recognize the parts were I am right. Thank you.

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 5 · 2014-12-21T05:46:50+00:00

What you must do is to create a function foo(int) that internally "saves" the int into a deque.

Right. That's what I described as the typical trick to make this work (creating a pure C interface (or carefully limited C++ interface) between the debug and release parts of the code. But like I said, this is could constitute a lot changes (with the potential for additional bugs) to the existing code, and therefore, not very practical in general.

But this is not the case here, the original problem is to create a queue for floats.

If you read the original post again, kungle is implying that he is working on a substantially project and that the example code about deques of floats is merely one small example of the kind of tricks he had to do, picked out of that larger code-base. Clearly, he is looking for a general solution that will work for a large code-base with many uses of the STL in various places.

So, isolating a few deques behind a C interface is not a big deal, but isolating large sections of an existing code-base and refactoring all the code that calls it is a whole different ball-game. I was addressing the latter, and you seem to be addressing the former.

Typical good style C++ code is riddled with uses of STL containers and algorithms. And if the idea is to go about wrapping each of these behind some C API, then that gets very impractical and kind of pointless if the only reason for doing it is to debug some code.

Finally I must admit that I appreciate to disagree with you because I think this discussion is very informative.

I also enjoy such informative discussions!

But I hope you will also recognize the parts were I am right.

Now that you have made yourself more clear about what you were thinking about or proposing, I think that you were "right". But that's not really what concerns me. My main concern is the quality of the advice that the original poster (or anyone else who might find their way to this thread in the future) receives. "Receive" is emphasized because advice is received not given.

When you gave your original advice of "compile some files with -O3 and others with the -Og option" and "you can extract some parts of your code into functions that you put on separated files that you compile with -O3", that wasn't clear enough and if taken at face-value, seems to suggest that naively compiling some bits of code in -O3 and some bits of code -Og and putting it together is going to work just fine. That is how such advice would be received, even if that's not the advice that you were trying to give. It is certainly the way I perceived it, and why I had to immediately refute it, or at least, put a major caveat on it, because I could not leave the possibility that someone might interpret this advice like that. Your second post also had similar issues as it kind of seemed to double-down on your original misleading advice ("misleading" by being too vague, not necessarily entirely "wrong"). That made me shift into a whole new gear, where I had to give a very detailed clarification of the issues involved.

The point is that if you want to advise people to use a technique that is very tricky (like this one), then you have to be prepared to give sufficient explanation of it to be sure that the person will be aware of and understand all the issues and potential pitfalls surrounding it. Otherwise, it's better to abstain, instead of giving some advice that is too short or too vague that it could easily lead the person straight to disaster.

But I'm glad that all the necessary details and issues were brought up, agreed upon, and that this turned into the very informative thread that it now is.

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 6 · 2014-12-23T04:28:23+00:00

It's really impressive that the GNU / GCC team managed to do that. When I was checking the source code for deque (see a few posts back), I did see some hints at the fact that this might be possible, from the convoluted way in which they structured things, but I didn't want to say anything because there wasn't an explicit stated guarantee. Thanks for finding this documentation page that formally states the guarantee. Hopefully the OP can use this somehow.

Maritimo 15 Junior Poster in Training · Answer 7 · 2014-12-23T16:43:36+00:00

Realy nice. I think that the final answer to the very initial question could be:

To achieve fast operations of STL libraries during development, you can compile different parts of your code with different optimization flags: -Og for that sections that require debugging and -O3 for that sections you are sure are correct (queue in this case). Doing this, in general make sure that the information transmitted between those parts are only fundamental types (bool, char, int, double, ...) and/or POD ("Plain Old Data"). You can be assisted to know if some data is POD with the predicate is_pod from <type_traits>.

If you need more than this with the STL library, you can follow the advice given by vijayan121 from gcc.gnu.org.

How do you handle the slow execution speed of STL during development?

Recommended Answers Collapse Answers

All 10 Replies

Recommended Answers