I read that using a policy class for a function that will be called in a tight loop is much faster than using a polymorphic function. However, I setup this demo and the timing indicates that it is exactly the opposite!? The policy version takes between 2-3x longer than the polymorphic version.

#include <iostream>

#include <boost/timer.hpp>

// Policy version
template < typename operation_policy>
class DoOperationPolicy : public operation_policy
{
  using operation_policy::Operation;

public:
  void Run(const float a, const float b)
  {
    Operation(a,b);
  }
};

class OperationPolicy_Add
{
protected:
  float Operation(const float a, const float b)
  {
    return a + b;
  }
};

// Polymorphic version
class DoOperation
{
public:
  virtual float Run(const float a, const float b)= 0;
};

class OperationAdd : public DoOperation
{
public:
  float Run(const float a, const float b)
  {
    return a + b;
  }
};

int main()
{
  boost::timer timer;

  unsigned int numberOfIterations = 1e7;

  DoOperationPolicy<OperationPolicy_Add> policy_operation;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
    {
    policy_operation.Run(1,2);
    }
  std::cout << timer.elapsed() << " seconds." << std::endl;
  timer.restart();

  DoOperation* polymorphic_operation = new OperationAdd;
  for(unsigned int i = 0; i < numberOfIterations; ++i)
    {
    polymorphic_operation->Run(1,2);
    }
  std::cout << timer.elapsed() << " seconds." << std::endl;

}

Is there something wrong with the demo? Or is just incorrect that the policy should be faster?

Recommended Answers

All 5 Replies

Without optimizations, I get:

0.078 seconds.
0.047 seconds.

With -O3 (best speed optimization) and after
setting numberOfIterations to 1e9 , I get:

0 seconds.
3.297 seconds.

I use the version of mingw that comes along with the latest Code::Blocks.

Gah I always forget about optimizations. I get the same result, thanks.

There are a number of things happening here that make this test flawed.

Generally when people make performance arguments between static versus dynamic polymorphism (i.e. based on templates or based on virtual functions, respectively), the thing that makes all the difference is whether the mechanism is static or dynamic. So, any test to compare the two is void if you make it trivial enough that the compiler can easily resolve the dynamic case back into a static one. I'll explain.

When the polymorphism is realized at compile-time (statically) then there are several advantages, i.e., additional things the compiler can do to speed up the execution. Amongst those optimization opportunities are these main ones:
- Replace the function call by an inline expansion.
- Do more aggressive copy-elision for the parameters and return values.
- Arrange the compiled code in the object-file such that related functions (related by mutual calls) are close together, minimizing cache misses.
- Work in-place and without indirection with the "this" object in member functions (when that object is stack-based and the call is inlined).
- etc... there are more I'm sure, I'm no expert in optimization techniques.

All these optimizations are clearly only possible because the compiler has knowledge of both the call-site and the called function at compile-time (and because the function is often uniquely tailored (templated) to the specific call-site making it an obviously good candidate for inlining).

So, in order to compare code that benefits from the above optimizations with code that doesn't, well, unless you have fine-grained control over the kinds of optimizations your compiler is allowed to do, you have to create code for the dynamic polymorphism case that makes all of the above optimizations impossible. Otherwise, you can be pretty sure that the compiler will do those optimizations, because it doesn't care what your code looks like (templated or with virtuals), it will try to produce the fastest code it can.

Now, in your example, the compiler can easily resolve both the call-site (loop in the main function) and the called function (virtual overload) in the dynamic case, and because the call is more direct, it ends up being faster if optimizations are only minimal. And as master_r0shi reports, if you turn on the optimizations, the static polymorphism case turns out faster (this is due to the side-effect of creating a dynamic polymorphic object, while the static case probably determines that the entire loop has no effect and optimizes it away entirely).

Basically, your test is flawed because everything that makes static polymorphism faster than dynamic polymorphism was also granted to the dynamic polymorphism case in your test. Comparing, objectively, static and dynamic polymorphism is actually quite difficult, because the compilers always try to make the dynamic polymorphism case not as bad as it should be in theory and often succeeds, but overall, with static polymorphism you have a higher guarantee of best performance.

Hmmm... Interesting stuff from mike... Ok, this one should demonstrate the difference better:

#include <iostream>
#include <cstdlib>

#include <boost/timer.hpp>

// policy version
template < class OperationPolicy>
struct DoOperationPolicy : OperationPolicy
{ void Run(double a, double b) { OperationPolicy::Operation(a,b); } };

struct OperationPolicyAdd { double Operation(double a, double b) { return a + b; } };
struct OperationPolicySub { double Operation(double a, double b) { return a - b; } };
struct OperationPolicyMul { double Operation(double a, double b) { return a * b; } };
struct OperationPolicyDiv { double Operation(double a, double b) { return a / b; } };

// polymorphic version
struct DoOperation { virtual double Run(double a, double b) = 0; };

struct OperationAdd : DoOperation { double Run(double a, double b) { return a + b; } };
struct OperationSub : DoOperation { double Run(double a, double b) { return a - b; } };
struct OperationMul : DoOperation { double Run(double a, double b) { return a * b; } };
struct OperationDiv : DoOperation { double Run(double a, double b) { return a / b; } };

int main()
{
    boost::timer timer;

    const unsigned int numberOfIterations = 1e7;

    DoOperationPolicy<OperationPolicyAdd> policy_operation_add;
    DoOperationPolicy<OperationPolicySub> policy_operation_sub;
    DoOperationPolicy<OperationPolicyMul> policy_operation_mul;
    DoOperationPolicy<OperationPolicyDiv> policy_operation_div;

    timer.restart();

    for(unsigned int i = 0; i < numberOfIterations; ++i)
    {
        switch(rand()%4)
        {
            case 0 : policy_operation_add.Run(rand()%9+1, rand()%9+1); break;
            case 1 : policy_operation_sub.Run(rand()%9+1, rand()%9+1); break;
            case 2 : policy_operation_mul.Run(rand()%9+1, rand()%9+1); break;
            case 3 : policy_operation_div.Run(rand()%9+1, rand()%9+1); break;
            default : ;
        }
    }

    std::cout << timer.elapsed() << " seconds." << std::endl;

    DoOperation * polymorphic_operations[] =
    {
        new OperationAdd, new OperationSub,
        new OperationMul, new OperationDiv
    };

    timer.restart();

    for(unsigned int i = 0; i < numberOfIterations; ++i)
        polymorphic_operations[rand()%4]->Run(rand()%9+1, rand()%9+1);

    std::cout << timer.elapsed() << " seconds." << std::endl;

    return 0;
}

Without optimizations, I get:

0.984 seconds.
0.891 seconds.

With -O3, I get:

0.406 seconds.
0.750 seconds.

@m4ster_r0shi:

That is definitely a better test (with static analysis in the second case being impossible or at least very hard for most compilers). And the results are as expected. Without optimizations, the compiler inlines nothing and thus, the small performance penalty remains with the policy case because of the extra function call (jumps to Run() and then to Operation(), while the dynamic polymorphism case reads the virtual table and jumps straight to the useful function). With optimizations, the policy functions probably get inlined in the switch-case, so all you get is the time to evaluate the random-numbers, the switch-case (with poor branch-prediction), and the addition operation, while the dynamic polymorphism case adds only the overhead of a function call.

However, in reality, in larger projects, the performance difference becomes a lot harder to evaluate and it is expected to be a lot more significant because of the more important issues such as cache misses, branch-prediction, and memory locality. A good example are typical object-oriented (with virtual functions) matrix libraries that are much much slower than equivalent template-based matrix libraries, I mean orders of magnitude slower.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.