1. Introduction

Metaprogramming is usually defined as the creation of programs which generate other programs. Parser generators such as YACC [Joh79] are examples of one kind of program-generating program. The input language to YACC is a context-free grammar in Extended Backus-Naur Form [EBNF], and its output is a program which parses that grammar. Note that in this case the metaprogram (YACC) is written in a language (C) which does not directly support the description of generated programs. These specifications, which we'll call metadata, are not written in C, but in a meta-language. Because the the rest of the user's program typically requires a general-purpose programming system and must interact with the generated parser, the metadata is translated into C, which is then compiled and linked together with the rest of the system. The metadata thus undergoes two translation steps, and the user is always very conscious of the boundary between her metadata and the rest of her program.

1.1. Native language metaprogramming

A more interesting form of metaprogramming is available in languages such as Scheme [SS75], where the generated program specification is given in the same language as the metaprogram itself. The metaprogrammer defines her meta-language as a subset of the expressible forms of the underlying language, and program generation can take place in the same translation step used to process the rest of the user's program. This allows users to switch transparently between ordinary programming, generated program specification, and metaprogramming, often without being aware of the transition.

1.2. Metaprogramming in C++

In C++, it was discovered almost by accident [Unr], [Vel95a] that the template mechanism provides a rich facility for computation at compile-time. In this section, we'll explore the basic mechanisms and some common idioms used for metaprogramming in C++.

1.2.1. Numeric computations

The availability of non-type template parameters makes it possible to perform integer computations at compile-time. For example, the following template computes the factorial of its argument:

template< unsigned n >
struct factorial
{
    static const unsigned value = n * factorial<n-1>::value;
};

template<>
struct factorial<0>
{
    static const unsigned value = 1;
};

The program fragment above is called a metafunction, and it is easy to see its relationship to a function designed to be evaluated at runtime: the ‘metafunction argument’ is passed as a template parameter, and its ‘return value’ is defined as a nested static constant. Because of the hard line between the expression of compile-time and runtime computation in C++, metaprograms look different from their runtime counterparts. Thus, although as in Scheme the C++ metaprogrammer writes her code in the same language as the ordinary program, only a subset of the full C++ language is available to her: those expressions which can be evaluated at compile-time. Compare the above with a straightforward runtime definition of the factorial function:

unsigned factorial(unsigned N)
{
    return N == 0 ? 1 : N * factorial(N - 1);
}

While it is easy to see the analogy between the two recursive definitions, recursion is in general more important to C++ metaprograms than it is to runtime C++. In contrast to languages such as Lisp where recursion is idiomatic, C++ programmers will typically avoid recursion when possible. This is done not only for efficiency reasons, but also because of ‘cultural momentum’: recursive programs are simply harder (for C++ programmers) to think about. Like pure Lisp, though, the C++ template mechanism is a functional programming language: as such it rules out the use of data mutation required to maintain loop variables.

A key difference between the runtime and compile-time factorial functions is the expression of the termination condition: our meta-factorial uses template specialization as a kind of pattern-matching mechanism to describe the behavior when N is zero. The syntactic analogue in the runtime world would require two separate definitions of the same function. In this case the impact of the second definition is minimal, but in large metaprograms the cost of maintaining and understanding the terminating definitions can become significant.

Note also that a C++ metafunction's return value must be named. The name chosen here, value, is the same one used for all numeric returns in the MPL. As we'll see, establishing a consistent naming convention for metafunction returns is crucial to the power of the library.

1.2.2. Type computations

How could we apply our factorial metafunction? We might, for example, produce an array type of an appropriate size to hold all permutations of instances of another type:

// permutation_holder<T>::type is an array type which can contain 
// all permutations of a given T.

// unspecialized template for scalars
template< typename T >
struct permutation_holder
{
    typedef T type[1][1];
};

// specialization for array types
template< typename T, unsigned N >
struct permutation_holder<T[N]>
{
    typedef T type[factorial<N>::value][N];
};

Here we have introduced the notion of a type computation. Like factorial above, permutation_holder template is a metafunction. However, where factorial manipulates unsigned integer values, permutation_holder accepts and ‘returns’ a type (as the nested typedef type). Because the C++ type system provides a much richer set of expressions than anything we can use as a nontype template argument (e.g. the integers), C++ metaprograms tend to be composed mostly of type computations.

1.2.3. Type sequences

The ability to programmatically manipulate collections of types is a central tool of most interesting C++ metaprograms. Because this capability is so well-supported by the MPL, we'll provide just a brief introduction to the basics here. Later on, we'll revisit the example below to show how it can be implemented using MPL.

First, we'd need a way to represent the collection. One idea might be to store the types in a structure:

struct types
{
    int t1;
    long t2;
    std::vector<double> t3;
};

Unfortunately, this arrangement is not susceptible to the compile-time type introspection power that C++ gives us: there's no way to find out what the names of the members are, and even if we assume that they're named according to some convention as above, there's no way to know how many members there are. The key to solving this problem is to increase the uniformity of the representation. If we have a consistent way to get the first type of any sequence and the rest of the sequence, we can easily access all members:

template< typename First, typename Rest >
struct cons
{
    typedef First first;
    typedef Rest rest;
};

struct nil {};

typedef
      cons<int
    , cons<long
    , cons<std::vector<double>
    , nil
    > > > my_types;

The structure described by types above is the compile-time analogue of a singly-linked list; it has been first introduced by Czarnecki and Eisenecker in [CE98]. Now that we've adjusted the structure so that the C++ template machinery can ‘peel it apart’, let's examine a simple metafunction which does so. Suppose a user wished to find the largest of an arbitrary collection of types. We can apply the recursive metafunction formula which should by now be familiar:

Example 1. 'largest' metafunction

// choose the larger of two types
template<
      typename T1
    , typename T2
    , bool choose1 = (sizeof(T1) > sizeof(T2)) // hands off!
    >
struct choose_larger
{
    typedef T1 type;
};

// specialization for the case where sizeof(T2) >= sizeof(T1)
template< typename T1, typename T2 >
struct choose_larger< T1,T2,false >
{
    typedef T2 type;
};

// get the largest of a cons-list
template< typename T > struct largest;

// specialization to peel apart the cons list
template< typename First, typename Rest >
struct largest< cons<First,Rest> >
    : choose_larger< First, typename largest<Rest>::type >
{
    // type inherited from base
};

// specialization for loop termination
template< typename First >
struct largest< cons<First,nil> >
{
    typedef First type;
};

int main()
{
    // print the name of the largest of my_types
    std::cout
        << typeid(largest<my_types>::type).name()
        << std::endl
        ;
}

There are several things worth noticing about this code:

  • It uses a few ad-hoc, esoteric techniques, or ‘hacks’. The default template argument choose1 (labeled ‘hands off!’) is one example. Without it, we would have needed yet another template to provide the implementation of choose_larger, or we would have had to provide the computation explicitly as a parameter to the template - perhaps not bad for this example, but it would make choose_larger much less useful and more error-prone. The other hack is the derivation of a specialization of largest from choose_larger. This is a code-saving device which allows the programmer to avoid writing ‘typedef typename ...::type type’ in the template body.

  • Even this simple metaprogram uses three separate partial specializations. The largest metafunction uses two specializations. One might expect that this indicates there are two termination conditions, but there are not: one specialization is needed simply to deal with access to the sequence elements. These specializations make the code difficult to read by spreading the definition of a single metafunction over several C++ template definitions. Also, because they are partial specializations, they make the code unusable for a large community of C++ programmers whose compilers don't support that feature.

While these techniques are, of course, a valuable part of the arsenal of any good C++ metaprogrammer, their use tends to make programs written in what is already an unusual style harder-to-read and harder-to-write. By encapsulating commonly-used structures and dealing with loop terminations internally, the MPL reduces the need for both tricky hacks and for template specializations.

1.3. Why metaprogramming?

It's worth asking why anyone would want to do this. After all, even a simple toy example like the factorial metafunction is somewhat esoteric. To show how the type computation can be put to work, let's examine a simple example. The following code produces an array containing all possible permutations of another array:

// can't return an array in C++, so we need this wrapper
template< typename T >
struct wrapper
{
    T x;
};

// return an array of the N! permutations of 'in'
template< typename T >
wrapper< typename permutation_holder<T>::type >
all_permutations(T const& in)
{
    wrapper<typename permutation_holder<T>::type> result;

    // copy the unpermutated array to the first result element
    unsigned const N = sizeof(T) / sizeof(**result.x);
    std::copy(&*in, &*in + N, result.x[0]);

    // enumerate the permutations
    unsigned const result_size = sizeof(result.x) / sizeof(T);
    for (T* dst = result.x + 1; dst != result.x + result_size; ++dst)
    {
        T* src = dst - 1;
        std::copy(*src, *src + N, *dst);
        std::next_permutation(*dst, *dst + N);
    }
    return result;
}

The runtime definition of factorial would be useless in all_permutations above, since in C++ the sizes of array members must be computed at compile-time. However, there are alternative approaches; how could we avoid metaprogramming, and what would the consequences be?

  1. We could write programs to interpret the metadata directly. In our factorial example, the array size could have been a runtime quantity; then we'd have been able to use the straightforward factorial function. However, that would imply the use of dynamic allocation, which is often expensive.

    To carry this further, YACC might be rewritten to accept a pointer-to-function returning tokens from the stream to be parsed, and a string containing the grammar description. This approach, however, would impose unacceptable runtime costs for most applications: either the parser would have to treat the grammar nondeterministically, exploring the grammar for each parse, or it would have to begin by replicating at runtime the substantial table-generation and optimization work of the existing YACC for each input grammar.

  2. We could replace the compile-time computation with our own analysis. After all, the size of arrays passed to all_permutations are always known at compile-time, and thus can be known to its user. We could ask the user to supply the result type explicitly:

    template< typename Result, typename T >
    Result all_permutations(T const& input);
    

    The costs to this approach are obvious: we give up expressivity (by requiring the user to explicitly specify implementation details), and correctness (by allowing the user to specify them incorrectly). Anyone who has had to write parser tables by hand will tell you that the impracticality of this approach is the very reason of YACC's existence.

    In a language such as C++, where the metadata can be expressed in the same language as the rest of the user's program, expressivity is further enhanced: the user can invoke metaprograms directly, without learning a foreign syntax or interrupting the flow of her code.

So, the motivation for metaprogramming comes down to the combination of three factors: efficiency, expressivity, and correctness. While in classical programming there is always a tension between expressivity and correctness on one hand and efficiency on the other, in the metaprogramming world we wield new power: we can move the computation required for expressivity from runtime to compile-time.

1.4. Why a metaprogramming library?

One might just as well ask why we need any generic library:

  • Quality. Code that is appropriate for a general-purpose library is usually incidental to the purpose of its users. To a library developer, it is the central mission. On average, the containers and algorithms provided by any given C++ standard library implementation are more-flexible and better-implemented than the project-specific implementations which abound, because library development was treated as an end in itself rather than a task incidental to the development of some other application. With a centralized implementation for any given function, optimizations and improvements are more likely to have been applied.

  • Re-use. More important even than the re-use of code which all libraries provide, a well-designed generic library establishes a framework of concepts and idioms which establishes a reusable mental model for approaching problems. Just as the C++ Standard Template Library gave us iterator concepts and a function object protocol, the Boost Metaprogramming Library provides type-iterators and metafunction class protocol. A well-considered framework of idioms saves the metaprogrammer from considering irrelevant implementation details and allows her to concentrate on the problem at hand.

  • Portability. A good library can smooth over the ugly realities of platform differences. While in theory a metaprogramming library is fully generic and shouldn't be concerned with these issues, in practice support for templates remains inconsistent even four years after standardization. This should perhaps not be surprising: C++ templates are the language's furthest-reaching and most complicated feature, which largely accounts for the power of metaprogramming in C++.

  • Fun. Repeating the same idioms over and over is tedious. It makes programmers tired and reduces productivity. Furthermore, when programmers get bored they get sloppy, and buggy code is even more costly than slowly-written code. Often the most useful libraries are simply patterns that have been “plucked” by an astute programmer from a sea of repetition. The MPL helps to reduce boredom by eliminating the need for the most commonly-repeated boilerplate coding patterns.

As one can see, the MPL's development is motivated primarily by the same practical, real-world considerations that justify the development of any other library. Perhaps this is an indication that template metaprogramming is finally ready to leave the realm of the esoteric and enter the lingua franca of every day programmers.