My question is basically if I have a 4D vector like

_declspec(align(16)) class Vec4
{
    float x;
    float y;
    float z;
    float w;
}

and I want to use operators like

Vec4 a(Vector4.One),b(Vector4.One),c(Vector4.Zero);

c = a + b; //Creates a 4th Vector4 here that is wasted

It would seem it should be possible to get rid of that wasted 4th vector in some way but can't figure out a way that does it without allocating memory for a pointer AND 4 floats for each vector which I'd like to avoid. Seems like this could be easily avoided through compiler in theory if syntax such as

Vec4 c = a + b;

was used. Since c is going to call the default constructor but instead wouldn't need to because it knows a + b returns a value that it should be. So instead could just move the temporary created by a + b to c in that situation without risk.

Been doing some 3D math optimizations that have been pretty profitable on a program of mine since a lot of libraries don't take advantage of application specific truncations that I can do. The coding is kinda laborious and was just wondering if their was a way to make the code look better without sacrificing a large amount of performance because of excess allocation / wastage. Seems like a pretty big oversight imo.

I also dream of a day that compilers will be able to tell the difference based on return type between void and non-void versions of the function involving things like '+=' or '='. But that is too much.

1) Return-value-optimization (RVO):
The situation you describe is one of the basic applications of return-value-optimization. This is a classic optimization, permitted by C++, and implemented by all decent C++ compilers that I know of. You can check the assembly listings generated by your compiler for the given code to see what actual code it produces (if you do a lot of nano-optimization as you seem to imply, then you definitely need to learn to produce and read the assembly listings of your compiler).

Seems like this could be easily avoided through compiler in theory if syntax such as Vec4 c = a + b; was used.

Yes, the compiler can easily avoid the temporary in this case, and it will do so through RVO. Even for code like c = a + b;, RVO will apply as well, given certain conditions.

2) Plain-Old-Data types (POD types):
When types are very simple (C-struct like types), we usually call it a POD-type. If it meets a number of strict conditions, the compiler will also call it a POD-type and perform a number of more aggressive optimizations in that case. The conditions are basically that it should have a default everything (constructor, copy-constructor, copy-assignment, destructor), no virtual functions, and no non-POD data members. Essentially, this allows the compiler to assume that all creation, destruction and copy operations can be performed by trivial copying of the memory (like C-style memmove()). In other words, the compiler assumes there is no additional work needed during copies and that copies don't cause other effects (side-effects, e.g., printing or logging the event). When this is the case, the compiler can be much more aggressive at eliminating temporaries, and it sometimes even goes as far as eliminating local variables or even entire loops (i.e., stupid loops, of course. I have sometimes made loops for timing a function (repeat a calculation a million times, and time it), and I found that the compiler optimized away the entire loop because it had no effect (storing the result in a temporary only)).

So, in your case, your data type is definitely a POD-type, and would be subject to aggressive optimizations (if you turn them on, of course!).

Notice also that the new C++ standard (C++11) has relaxed the definition of POD-types to make this kind of optimization even more likely.

3) Move-semantics (C++11):
For non-POD types, in addition to RVO, the new C++ standard has introduced move-semantics which allows for moving objects around in ways that are much cheaper than copying them from temporaries to temporaries. This is achieved through rvalue-references.

The coding is kinda laborious and was just wondering if their was a way to make the code look better without sacrificing a large amount of performance because of excess allocation / wastage. Seems like a pretty big oversight imo.

Yes, it was a big issue / oversight. It also was on top of the list when drafting the new standard. So, the issue is essentially solved at this point (if your compiler supports C++11 features).

This article has been dead for over six months. Start a new discussion instead.