C++11 Compile-time String Concatenation with constexpr

mike_2000_17 6 Tallied Votes 11K Views Share

Hi all,

I was just playing around with some ideas about optimizing a core piece of my library: the run-time type identification system (RTTI). What does that have to do with compile-time string concatenation? Well, the whole point of the RTTI is to provide information on the type of objects, and a string representation of that name is a pretty central part of that (as well as a hash value). Type names are made of identifiers, which are names that are known at compile-time, and could thus be compile-time string literals (that means things like "Hello world!"). The problem is, for complex types, like templates, many compile-time strings need to be assembled or concatenated to form the complete type names. At that point, I had to switch over to run-time code (e.g., using std::string or C-style functions) to form those compound strings. Why? Because there is no way to concatenate strings at compile-time, or so I thought...

What makes this difficult is that normally, when you concatenate strings, you just create a new chunk of memory big enough for the two original strings and then copy the two strings one after the other. Easy right? Well, there are two problems. First, you cannot "create a chunk of memory" at compile-time because there is no concept of memory at this point. Second, you cannot copy data because that would imply changing things, and you cannot change compile-time constants. So, clearly, the traditional method won't work.

Welcome to the world of pure functional programming!

Pure functional programming works on two fundamental principles: there are only values, and values can never change. "Only values" means no variables. You don't assign values to variables, like you do in imperative programming (i.e., the "normal" way), you simply have to keep generating new values that generate others and so on until you get to the final value (or desired result).

Pure functional programming is a big part of C++, because there are two additional programming languages hidden in the C++ language, and they are pure functional languages. The first is very well known to most seasoned C++ programmers, it's called template meta-programming (TMP) and it can be used to compute anything you want at compile-time, but it is mainly used to generate secondary or derived types when doing generic programming. The other language is that of constexpr expressions, a relatively new feature added to the standard in 2011 (C++11). The rules of the game with constexpr is pretty much the same as for TMP, because they are both compile-time meta-programming languages, but const-expressions are made to create compile-time values, as opposed to types (in TMP). Although TMP can be used to compute values (i.e., as static members of types), it's very inefficient to do so because of the overhead in compilation times and code-bloat (e.g., symbols, useless code, etc.), and dealing with string values in TMP is even worse, which is why the lesser-evil solution to my problem (before C++11) was to use run-time string values. But now, it looks like constexpr is the right tool for the job!

However, I could not find any practical, lean-and-mean implementation of string concatenation using constexpr. So, I wrote my own, and here it is. I thought I would post it here, for its educational value. Mind you, this is still just a draft.

(WARNING: If you're not ready for it, reading what follows could blow your mind!)

If you are not familiar with constexpr, here is a quick overview. Const-expressions allow you to write some "normal" functions that will be evaluated at compile-time (by the compiler, as it is parsing the code) if certain conditions are met; specifically, all its inputs must be const-expressions (or compile-time constants) and it cannot have intermediate variables (i.e. local variables). For example, you could have this:

constexpr int add(int a, int b) { return a + b; };

which means that if the inputs a and b happen to be compile-time constants, the compile will evaluate the function (addition) at compile-time to produce a result which is itself a compile-time constant (e.g., could be used as the size value for a static array, or as a template argument). This means that you could write something like this:

std::array<double, add(5, 10)> arr;

because both 5 and 10 are compile-time constants, and therefore, the result of add(5,10) is too.

So, back to the problem at hand, how does one go about putting together two strings if one cannot copy or change anything? Well, even though this sounds like a difficult restriction, it does grant a side benefit: if nothing can change, things are persistent, i.e., they will always be there, unmodified. The reason you want to copy or assign data in general is because the source of that data is not persistent, i.e., it could disappear or change some time in the future. So, we don't need to make a copy if we can rely on the persistence and immutability of the source.

This principle is well explained by this simple class that "stores" a compile-time string:

class literal_str {
  const char* const text_ptr;
  unsigned int text_size;
public:
  template <unsigned int N>
  constexpr literal_str(const char(&aStr)[N]) : 
                        text_ptr(aStr), text_size(N-1) {
    static_assert(N >= 1, "Invalid string literal! Length is zero!");
  };
};

This is what is called a literal class because it fulfills a number of basic requirements, and it has a constructor marked with constexpr, meaning that if its inputs are compile-time constants, then the object created will be itself a compile-time constant (yes, you heard that right, the compiler will create the object at compile-time!).

The point here is that a string literal, such as "Hello world!", is a static array, which is what the constructor of literal_str takes in. Because a string literal is a compile-time constant (managed by the compiler, and usually stored in read-only data section of the final executable, if needed at run-time), it will always be there and remain unchanged, and therefore, to "store" it, we can simply store a pointer to it (yes, a compile-time constant pointer!).

So, now we can look at the problem of concatenation. If we want to "store" two strings onto a single one, then we simply need to "store" one string, and "store" another. Here we go, down the rabbit hole:

class literal_str_list {
  const char* const text_ptr;
  unsigned int text_size;
  const literal_str_list* const head;
public:
  template <unsigned int N>
  constexpr literal_str_list(const char(&aStr)[N],
                             const literal_str_list* aHead = nullptr) : 
                             text_ptr(aStr), text_size(N-1), head(aHead) {
    static_assert(N >= 1, "Invalid string literal! Length is zero!");
  };
};

What happened here is that one string is "stored" as before, but there is also a pointer to another compile-time object that "stores" a string. It is not a coincidence that the class is now called "list" and has a pointer called "head". This is indeed a single linked-list, that is, a compile-time single linked-list. And as you probably know, this can, in principle, store a long list of strings all concatenated together.

So, we can do a few things with this literal string class, such as getting its size and retrieving an individual character from it, all at compile-time:

class literal_str_list {
  //... as above ..

  constexpr unsigned int size() const {
    return text_size + (head != nullptr ? head->size() : 0);
  };

  constexpr char get_char_from_head(unsigned int i, unsigned int hd_size) const {
    return (i < hd_size ? 
      (*head)[i] : 
      (i < (hd_size + text_size) ? text_ptr[i - hd_size] : '\0'));
  };

  constexpr char operator[](unsigned int i) const {
    return (head != nullptr ? 
      get_char_from_head(i, head->size()) :
      (i < text_size ? text_ptr[i] : '\0'));
  };

};

As you can notice, everything is done recursively. If you've been following my posts here on Daniweb, you might find this odd, because I generally discourage recursion vehemently. However, in pure functional programming, recursion is the only option. Any form of iteration will require that you modify some kind of a variable (e.g., loop counter, termination condition, etc.), which is completely impossible in pure functional programming. So, everything has to be turned into a recursion (which is always possible, if you didn't know that already). So, here, the size() function simply does a recursive traversal of the linked-list to accumulate the sizes of the individual strings. The operator[] function is a bit more tricky because it has to locate the individual string into which the index i falls.

One might point out that things would be a lot simpler if I had used forward links (with "next" pointers). However, there is a reason why it had to be backwards links, and it will become clear as we look at a concatenation operator:

class literal_str_list {
  //... as above ..

  template <unsigned int N>
  constexpr literal_str_list operator+(const char(&aHead)[N]) const {
    return literal_str_list(aHead, this);
  };

};

The problem here is that when we construct the new literal_str_list object, we need two things: a string literal and an existing literal_str_list object. But we have an additional quirky problem to deal with, which is the fact that the "this" object in the above operator can be recognized as a const-expression, but not if it was a parameter to the function (it is really hard to explain why). This means that we must define the operator+ as a member function if we want the "this" object to be recognized as a const-expression (again, breaking another one of my rules, prefer defining operators as free-functions or friends). This implies that the literal_str_list object involved in the concatenation (passed to the constructor) is always going to be the left operand of the concatenation, and that is why the linked-list needs to be using backward pointers.

That is pretty much it. There are a couple more fancy little things in the code (like a compile-time FNV-1a hashing function), but that was the important part. Here is a test program:

#include "literal_string_list.hpp"

#include <iostream>
#include <iomanip>
#include <string>
#include <type_traits>

std::string convert_to_string(const literal_str_list& lit) {
  std::string result(lit.size(), ' ');
  lit.copy_to(result.begin(), result.end());
  return result;
};

// Note, the following are all compile-time constants:

constexpr literal_str_list hello = "Hello ";
constexpr literal_str_list world = "World!";
constexpr literal_str_list lit_str = hello + "World!";
constexpr literal_str_list lit_str2 = hello + world;

constexpr char c1 = lit_str[5];
constexpr std::size_t h1 = lit_str.hash();


int main() {

  std::cout << "Got string: " << convert_to_string(lit_str) << std::endl;
  std::cout << "Got string: " << convert_to_string(lit_str2) << std::endl;

  std::cout << "Sixth character is '" << c1 << "'." << std::endl;
  std::cout << "Hash value is " << std::hex << h1 << std::endl;

  if(std::is_literal_type< literal_str_list >::value)
    std::cout << "The literal_str_list class is considered as a literal type!" << std::endl;
  else
    std::cout << "The literal_str_list class is NOT considered as a literal type!" << std::endl;

};

The resulting code, after compilation, is simply a linked-list being formed in the read-only data section of the executable, as seen in these assembly listings:

.LC5:                             // LC5 is the 'pointer' to the "World!" string
    .string "World!"              // this is the "World!" string entry
    .section    .rodata           // in read-only data section
    .align 16
    .type   _ZL7lit_str, @object
    .size   _ZL7lit_str, 24
_ZL7lit_str:                      // this is the 'pointer' to the 'lit_str' object
    .quad   .LC5                  // points to the "World!" string
    .long   6
    .zero   4
    .quad   _ZL5hello             // has a 'head' pointer to the 'hello' object
    .section    .rodata.str1.1    // again, in read-only data section

.LC6:                             // LC6 is the 'pointer' to the "Hello " string
    .string "Hello "              // this is the "Hello " string entry
    .section    .rodata           // in read-only data section
    .align 16
    .type   _ZL5hello, @object
    .size   _ZL5hello, 24
_ZL5hello:                        // this is the 'pointer' to the 'hello' object
    .quad   .LC6                  // points to the "Hello " string
    .long   6
    .zero   4
    .quad   0                     // has a null 'head' pointer (end of list)

There is definitely some overhead to this scheme (as with any use of a linked-list), but the important part here is that there is no initialization code needed, the strings can simply be read off from read-only memory. If one were to use "normal" string concatenation code to create some global string objects (instead of the constexpr constants in the test code above), there would be a much larger amount of overhead to account for the initialization code that needs to run in order to construct those global objects. And when my intended application (RTTI) is to construct such a string for every type (or template instantiation) in a library, this initialization code becomes a very measurable burden on the overall code.

ddanbe commented: ++ ! +15
Hiroshe commented: Excellent! +8
#ifndef LITERAL_STRING_LIST_HPP
#define LITERAL_STRING_LIST_HPP

#include <cstddef>
#include <cstring>

class literal_str_list {
private:
  
  const char* const text_ptr;
  unsigned int text_size;
  const literal_str_list* const head;
  
  constexpr char get_char_from_head(unsigned int i, unsigned int hd_size) const {
    return (i < hd_size ? (*head)[i] : (i < (hd_size + text_size) ? text_ptr[i - hd_size] : '\0'));
  };
  
  static constexpr std::size_t fnv_prime  = (sizeof(std::size_t) == 8 ? 1099511628211u : 16777619u);
  static constexpr std::size_t fnv_offset = (sizeof(std::size_t) == 8 ? 14695981039346656037u : 2166136261u);
  
  constexpr std::size_t fnv_1a_hash(unsigned int i) const {
    return (i == 0 ? 
      (head != nullptr ? 
        ((head->fnv_1a_hash(head->text_size-1) ^ text_ptr[0]) * fnv_prime) :
        fnv_offset) :
      ((fnv_1a_hash(i-1) ^ text_ptr[i]) * fnv_prime));
  };
  
  template <typename FwdIter>
  void copy_to_recurse(FwdIter& beg, FwdIter end) const {
    if( head != nullptr )
      head->copy_to_recurse(beg, end);
    for(unsigned int i = 0; (i < text_size) && (beg != end); ++i, ++beg)
      *beg = text_ptr[i];
  };
  
  void copy_to_recurse(char*& beg, char* end) const {
    if( head != nullptr )
      head->copy_to_recurse(beg, end);
    std::size_t sz_to_cpy = (end - beg < text_size ? end - beg : text_size);
    std::memcpy(beg, text_ptr, sz_to_cpy);
    beg += sz_to_cpy;
  };
  
  constexpr literal_str_list(const char* aStr, unsigned int N,
                             const literal_str_list* aHead = nullptr) : 
                             text_ptr(aStr), text_size(N), head(aHead) { };
  
public:
  
  template <unsigned int N>
  constexpr literal_str_list(const char(&aStr)[N],
                             const literal_str_list* aHead = nullptr) : 
                             text_ptr(aStr), text_size(N-1), head(aHead) {
    static_assert(N >= 1, "Invalid string literal! Length is zero!");
  };
  
  constexpr unsigned int size() const {
    return text_size + (head != nullptr ? head->size() : 0);
  };
  
  constexpr char operator[](unsigned int i) const {
    return (head != nullptr ? 
      get_char_from_head(i, head->size()) :
      (i < text_size ? text_ptr[i] : '\0'));
  };
  
  template <unsigned int N>
  constexpr literal_str_list operator+(const char(&aHead)[N]) const {
    return literal_str_list(aHead, this);
  };
  
  constexpr literal_str_list operator+(const literal_str_list& aHead) const {
    return literal_str_list(aHead.text_ptr, aHead.text_size, this);
  };
  
  constexpr std::size_t hash() const {
    return fnv_1a_hash(text_size-1);
  };
  
  template <typename FwdIter>
  void copy_to(FwdIter beg, FwdIter end) const {
    copy_to_recurse(beg, end);
  };
  
  void copy_to(char* beg, char* end) const {
    copy_to_recurse(beg, end);
  };
  
};


#endif
mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster

As an added little bonus, you can also use this literal string list class to perform some compile-time integer to string conversions:

static const char str_digits[] = "0123456789";

template <unsigned int N>
struct ct_itoa;

template <>
struct ct_itoa<0> {
  static constexpr literal_str_list text =
    literal_str_list("");
};

// Define static member outside the template:
constexpr literal_str_list ct_itoa<0>::text;

template <unsigned int N>
struct ct_itoa {
  static constexpr literal_str_list this_digit =
    literal_str_list(str_digits + (N % 10), 1);
  static constexpr literal_str_list text = 
    ct_itoa< N / 10 >::text + this_digit;
};

// Define static members outside the template:
template <unsigned int N>
constexpr literal_str_list ct_itoa<N>::this_digit;
template <unsigned int N>
constexpr literal_str_list ct_itoa<N>::text;

The method is simple. It's a classic template meta-programming recursion to compute a static constexpr string list object that contains the result.

The only tricky part here, and it's an important one, is that the static constexpr members of the class templates (ct_itoa) must be defined outside the classes (and that is true, even for non-templates). This is due to another tricky rule about C++ linkage. Static const/constexpr data members that are defined inside a class declaration have no linkage, i.e., they disappear after compilation. This means that our list of string literals will be broken as they refer to non-existent constexpr objects. Because of this, compilation will succeed, but linking will fail!

If that makes you feel completely lost, don't worry, we are very deep inside the rabbit hole right now.

All you really need to know is that this is what we can do with the above code:

constexpr literal_str_list cv_num = ct_itoa<4238>::text;

which doesn't sound that impressive given that we could just write:

const char cv_num[] = "4238";

However, where it really starts to pay off is when we realize we can do something like this:

template <unsigned int N>
struct foo {

  static constexpr literal_str_list N_str =
    ct_itoa<N>::text;

};

which you obviously could never do with normal code, or MACROs, or even do too easily with template meta-programming.

vijayan121 1,152 Posting Virtuoso

sprout::string from the Sprout C++ Libraries may be of interest.
https://github.com/bolero-MURAKAMI/Sprout

#include <iostream>
#include <sprout/string.hpp>

int main()
{
    constexpr auto hello = sprout::to_string( "hello" ) ;
    constexpr auto world = sprout::to_string( "world" ) ;
    constexpr auto hello_world = hello + " " + world ;

    static_assert( hello_world[5] == ' ', "" ) ;
    static_assert( hello_world.find( "llo worl" ) == 2, "" ) ;
    static_assert( hello_world.rfind( 'l' ) == hello_world.size() - 2, "" ) ;
    static_assert( hello_world.find_first_of( "ord" ) == 4, "" ) ;
    static_assert( hello_world.find_last_not_of( "old" ) == hello_world.size() - 3, "" ) ;
    static_assert( hello_world.back() == *hello_world.rbegin(), "" ) ;


    std::cout << hello_world.substr( 3, 5 ) << '\n' ; // lo wo

    constexpr auto s137 = sprout::to_string( 137 ) ;
    constexpr auto pi = sprout::to_string( double(22) / 7 ) ;
    constexpr auto eqn = s137 + " * " + pi + " = " + sprout::to_string( 137 * double(22) / 7 ) ;

    std::cout << eqn << '\n' ; // 137 * 3.142856 = 430.571429

    static_assert( eqn <= hello_world, "" ) ;
    constexpr auto hash = sprout::to_hash(eqn) ;
    std::cout << hash << '\n' ; // 13698126077260970431
}
mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster

Yes, I was aware of that implementation. It's very impressive, but at the same time, I feel that it has too much stuff in it, and that it is too heavy-duty.

In that library, concatenation is implemented by turning the string literal into a variadic template argument pack of chars and then unpacking it as a sequence of chars that brace-initialize an array of chars of the same size. It does have the advantage that is generates an actual char array (as opposed to linked-list of literals), which makes other operations like substring and iterators a lot easier (not to mention that it relieves some of the linking constraints on the objects being concatenated, which is very nice), but it is also a very heavy compile-time mechanism to put in play.

Doing fancy stuff like that with variadic templates has not been a joyful experience for me. I've found that the amount of code-bloat and compilation times that it causes is overwhelming. It's also very hard to control the proliferation of template instantiations with variadic templates.

My aim with this string concatenation stuff was mostly to avoid going into template meta-programming techniques too much (except that it is needed, afaik, for number-to-string conversion). For instance, there are plenty of ways to concatenate strings (and do other things) once you move onto things like boost::mpl::sequence and similar approaches like in sprout. Sprout is certainly not as bad as using Boost.MPL for this purpose, but it is still pretty heavy. And also, I don't aim, and don't want to implement a full compile-time equivalent to std::string, as sprout did. If I wanted to do that, I would probably use a similar approach. But like I said, I want a lean and mean approach that relies only on constexpr.

Hiroshe 499 Posting Whiz in Training

Haven't really looked into C++11 too seriously yet. Wow! Apperently lambdas also support closures! Now if we can implement continuations (combine that with multi-threading), then we'll get somewhere!

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster

UPDATE

So, I found a major problem with the implementation that I posted originally. The problem is that the concatenation operator assumes that the second literal string list is not an actual list but only a single node. Therefore, the concatenation only really works for adding one simple string literal at a time. To solve this issue is actually non-trivial because of the linkage requirements on the literals (every string literal or string list must have external linkage, which is difficult to generate for intermediary values). One possible solution is to involve construct a compile-time tree of strings, instead of a list, which is also non-trivial. Another solution, which I present below, is to build a compile-time array of string literals instead. I have made a number of trials in that vein, and I thought it was interesting to explain some of the issues that occurred.

First, on the linkage issue. This is really an immensely annoying issue. String literals, as well as contant-expressions, do not have external linkage by default. In simple words, constants don't get "exported". Therefore, building a linked-list of string literals requires that every intermediate value (node in the linked-list) be given external linkage via a syntax similar to the one I used above in the number-to-string conversion:

template <>
struct ct_itoa<0> {
  static constexpr literal_str_list text =
    literal_str_list("");
};
// Define static member outside the template:
constexpr literal_str_list ct_itoa<0>::text;

where the definition outside the class template is what gives the constexpr text external linkage. Now, this syntax may not look that bad in this example, but remember that every single intermediate string literal constexpr must have the same syntax applied to it. Therefore, for a long sequence of concatenations, you end up with a long sequence of static constexpr definitions. This is why the linkage problem is a big wrench thrown in the cogs of this literal string linked-list scheme. So, that motivates the use of compile-time arrays instead.

To make a compile-time array of string literals, we need a way to create and copy, at compile-time, entire strings. Ordinary string literals are static arrays of chars (const char[N]) which are not directly copyable. But thanks again to C++11, the standard library provides a std::array (docs) class template to represent static arrays and it can be constructed and copied at compile-time, via constexpr constructors. Furthermore, we can construct such an array from a string literal using aggregate initialization.

Then, because each array has the form std::array<char,N> where N is the length (including null-termination) of the array, which is different for each string literal, it means that we cannot simply store an array of std::array objects, because they each have a different type. So, we have to resort to another convenient standard class template of C++11, that is, the std::tuple<T...> (docs) variadic class template which allows one to store an arbitrary long sequence of heterogeneous objects. And obviously, a tuple can also be constructed and copied as a constexpr, given that all constituent types also have that property.

So, the basic class template we can rely on for this solution is as follows:

template <unsigned int... Ns>
struct literal_str_array {

  static constexpr unsigned int count = sizeof...(Ns);

  std::tuple< std::array<char, Ns>... > data;

  constexpr literal_str_array(const std::array<char, Ns>&... aStr) : data(aStr...) { };

};

#define LSA(LITERAL_STR) literal_str_array<sizeof(LITERAL_STR)>{std::array<char,sizeof(LITERAL_STR)>{LITERAL_STR}}

constexpr auto example_str = LSA("Hello World!");

which is a variadic class template that takes a list of integer sizes for the different sub-strings of the overall string. Each sub-string is stored as a static array of chars. Most string operations like getting the total size, getting an individual char (indexing), computing a FNV-1a hash, or converting to a run-time string (std::string) is rather straight-forward to implement (but again, everything must be done recursively, because compile-time calculations imply recursion as the only option).

Where things get a little interesting is when we need to concatenate strings. The following code does exactly that:

namespace detail { namespace {

template <unsigned int...>
struct index_seq { };

template <unsigned int N, unsigned int... S>
struct gen_index_seq : gen_index_seq<N-1, N-1, S...> { };

template <unsigned int... S>
struct gen_index_seq<0, S...> {
  typedef index_seq<S...> type;
};

}; };

template <unsigned int... Ns, unsigned int... NIds, 
          unsigned int... Ms, unsigned int... MIds>
constexpr literal_str_array<Ns..., Ms...> add_literal_arrays_impl(
  const literal_str_array<Ns...>& lhs, detail::index_seq<NIds...>,
  const literal_str_array<Ms...>& rhs, detail::index_seq<MIds...>) {
  return literal_str_array<Ns..., Ms...>(
    std::get<NIds>(lhs.data)..., std::get<MIds>(rhs.data)...);
};

template <unsigned int... Ns, unsigned int... Ms>
constexpr literal_str_array<Ns..., Ms...> operator+(
    const literal_str_array<Ns...>& lhs, 
    const literal_str_array<Ms...>& rhs) {
  typedef typename detail::gen_index_seq<sizeof...(Ns)>::type LIndices;
  typedef typename detail::gen_index_seq<sizeof...(Ms)>::type RIndices;
  return add_literal_arrays_impl(lhs, LIndices(), rhs, RIndices());
};

constexpr auto example_str_concat = LSA("Hello ") + LSA("World!");

Which might be a bit overwhelming, but let me explain. This is a hybrid of constexpr syntax and template meta-programming. The role of the gen_index_seq<N> class template is to recursively construct a sequence of indices from 0 to N-1 in the form of a variadic argument pack (the S... in index_seq<S...>). So, the addition operator takes care of generating two sequences of indices (one for each string array) to address each element of the tuple of char arrays within each string array. Then, those indices are fed to the helper add_literal_arrays_impl function, which deduces those indices and unpacks the elements of the two tuples into the destination string array. The end effect of this is that the final string array contains a copy of both original string arrays, one set after the other in a single tuple.

By the way, the namespace detail { namespace { is simply there to make sure that the class templates inside it are not going to conflict by name with other parts of the library (this is what the "detail" is for), and that they will not be unecessarily exported from the compiled library (which is what the namespace { is for, it's an anonymous namespace that hides everything from the linker). In other words, the "detail" hides things, by convention, during compilation, and the un-named namespace hides things during linking. This is useful to remove as much trace as possible of the existence of something that you are forced to write in a header file (e.g., templates, inline functions, helper classes etc.).

So, the result of using the above code for the following string:

constexpr auto hello_world = LSA("Hello ") + LSA("World!") + LSA(" Nice to see you!");

is the following entry in the read-only data section:

    .section    .rodata
    .type   _ZL11hello_world, @object
    .size   _ZL11hello_world, 32
_ZL11hello_world:
    .string " Nice to see you!"
    .string "World!"
    .string "Hello "

which clearly shows that all overhead is wiped out and all that persists is the final static array of static arrays of strings, with a total size of 32 bytes in the resulting binary.

Did that work out?

No. Why? Because executing code during the compilation of your actual code is extremely inefficient. This is the nature of a pure functional language where every operation creates new values instead of overwriting existing variables and thus, where everything is done with recursions. So, even though the final result is just one final value and everything else is washed away by the optimizer, the path to get there can be very strenuous.

So, I tried to apply the above string array implementation to my aforementioned RTTI system in my library code. It did result in some savings in terms of binary size of the libraries because of the elimination of run-time string concatenation code in favor of static strings in read-only memory. However, those savings are rather small (1-5% of total size), but only because I had optimized things already before this.

However, the compilation times nearly doubled. The memory consumed by the compiler when compiling the code was 2 to 3 times higher than before. The compiler reached the maximum template instantiation depth once, mostly due to the fact that to allow an std::array object to be created as a constexpr, the compiler must verify that the element type is also a literal type, which involves several type-traits invocations. After increasing the maximum template instantiation depth, the compilation process crashed my computer twice, because my computer only has 12GB of RAM and 20GB of swap space, which were both exhausted during compilation, leading to a system-wide crash.

So, all in all. Clever trick, yes. A success, no.

What next?

The recursive template instantiations are a result of the fact that the tuple class must recursively go through its element types to verify the conditions necessary for constructing a constexpr. So, we need to avoid the use of such a sequence of arrays, which implies merging all the arrays into one array. This is essentially the approach used by Sprout (that vijayan121 refers to above). Here are the essential mechanisms for doing this merger of the static arrays:

template <std::size_t N>
struct cnst_string {

  std::array<char, N> data;

  constexpr cnst_string(const std::array<char, N>& aStr) : data(aStr) { };

  constexpr std::size_t size() const { return N-1; };
  constexpr char operator[](std::size_t i) const {
    return (i < N-1 ? data[i] : '\0');
  };
  std::string to_string() const {
    return std::string(data.data());
  };
};

#define RK_CA(LITERAL_STR) cnst_string<sizeof(LITERAL_STR)>{std::array<char,sizeof(LITERAL_STR)>{LITERAL_STR}}

namespace detail { namespace {

template <std::size_t...>
struct index_seq { };

template <std::size_t N, std::size_t... S>
struct gen_index_seq : gen_index_seq<N-1, N-1, S...> { };

template <std::size_t... S>
struct gen_index_seq<0, S...> {
  typedef index_seq<S...> type;
};

}; };

template <std::size_t N, std::size_t... NIds, 
          std::size_t M, std::size_t... MIds>
constexpr cnst_string<N+M-1> concat_char_arrays_impl(
  const std::array<char, N>& lhs, detail::index_seq<NIds...>,
  const std::array<char, M>& rhs, detail::index_seq<MIds...>) {
  return cnst_string<N+M-1>{std::array<char,N+M-1>{{lhs[NIds]..., rhs[MIds]..., '\0'}}};
};

template <std::size_t N, std::size_t M>
constexpr cnst_string<N+M-1> operator+(const cnst_string<N>& lhs, const cnst_string<M>& rhs) {
  typedef typename detail::gen_index_seq<N-1>::type LIndices;
  typedef typename detail::gen_index_seq<M-1>::type RIndices;
  return concat_char_arrays_impl(lhs.data, LIndices(), rhs.data, RIndices());
};

which is similar to the previous approach, but instead of unrolling the two arrays of char-arrays into a larger array of char-arrays, we unroll the two char-arrays into one larger char-array. The advantage of this approach is that there are only different instantiations of the std::array<char,N> for different N (for each unique string length in your program), which should dramatically reduce the template instantiation depth required compared to the tuple-array version. However, the disadvantage of it is that for every unique length N for a string involved in a concatenation, there will be O(N^2) instantiations of the class template gen_index_seq (if anyone knows of a better trick to generate indices, I'm all ears), but gen_index_seq is a much lighter-weight template compared to std::tuple or std::array, and is not involved in type-traits checking (very important). Let's cross our fingers that this will be a favorable tradeoff.

Stay tuned for the results of putting this into my RTTI library. Hopefully I won't get a system-wide crash this time!

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster

OK, so I have tried to use the latest option (the cnst_string option above, similar to the Sprout library). Still got a system-wide crash due to an out-of-memory condition when compiling. However, judging by what did compile, the compilation seems to take roughly half the amount of memory as it does with the "tuple-array" solution (literal_str_array), and also significantly less time. Nevertheless, it is still at least twice as much as the plain version with run-time strings (std::string) concatenation. And, it overwhelms my system again.

The resulting binaries are, all in all, 2.3% smaller with this scheme. but before this attempt, I got roughly a 10% reduction before just by optimizing my use of std::string in the plain version, and that before that, I got over 50% reduction by simply hiding the things that did not absolutely need to be exposed (i.e. through un-named namespaces, as I showed in the last post) and by removing the use of virtual functions in the core RTTI class (something I might show in an upcoming tutorial, i.e., how to implement a polymorphic class with no virtual functions of any kind). So, this 2.3% reduction, at the cost of significant headaches with compilation times and memory consumption, is, to me, not a good tradeoff.

Conclusion

Compile-time string concatenation, under the current language limitations (C++11 and C++14), does not scale well due to the overwhelming template instantiation depths (O(N^2) or type-traits invocations) that it entails. I'm quite sad about that, because I had pretty high hopes that this could be a good potential optimization for my library, but it is not so. This is why I wanted to report this here, as a caution to others who might have the same idea.

The only practical and scalable way that strings could be concatenated at compile-time is if the C++ language could allow for native operations for string or array concatenations. String concatenation is a trivial operation for a compiler to do, and it must already do it for many other C++ features (e.g., the C++ RTTI), so, if programmers could simply tap into that power natively, it would incur negligible overhead.

Nevertheless, for light-weight situations where compile-time string concatenation is required, these schemes could come in handy!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.