Ciaran McCreesh’s Blag
Now with 17% more caffeine
Tag Archives: c++
Runtime Type Checking in C++ without RTTI
A technique I always seem to forget is how to map C++ types to an integer without relying upon RTTI. A variation on this is used in <locale> in standard library, for std::use_facet<>. But let’s take a much simpler, and highly contrived, example.
Let’s say we’ve got some values of different types, and we want to give those types to a library to store somewhere, and then we later want to get them back again. Crucially, the library itself doesn’t know anything about the types in question. So, for a very simple case:
#include <vector>
#include <iostream>
#include <string>
int main(int, char *[])
{
std::vector<Something> things = { std::string("foo"), 123 };
/* ... */
std::cout << things[0].as<std::string>() << " " << things[1].as<int>() << std::endl;
}
Note the gratuitous use of c++0x initialiser lists, just because we can.
Those familiar with Boost might think that Something is like boost::any. However, boost::any uses RTTI, which is slow and completely unnecessary.
A first implementation of Something might look like this:
#include <memory>
class Something
{
private:
struct SomethingValueBase
{
virtual ~SomethingValueBase()
{
}
};
template <typename T_>
struct SomethingValue :
SomethingValueBase
{
T_ value;
SomethingValue(const T_ & v) :
value(v)
{
}
};
std::shared_ptr<SomethingValueBase> _value;
public:
template <typename T_>
Something(const T_ & t) :
_value(new SomethingValue<T_>(t))
{
}
template <typename T_>
const T_ & as() const
{
return static_cast<const SomethingValue<T_> &>(*_value).value;
}
};
This works, but has a major flaw: if you get the types wrong when calling Something.as<>, you’ll get a segfault or something similarly horrible. We’d like to replace that with something safer.
One way to do it is to use runtime type information. The simplest variation on this is to replace the static_cast with a dynamic_cast. However, we can only do this if SomethingValueBase is a polymorphic type, which it isn’t. We can make it so by adding in a virtual destructor:
#include <memory>
class Something
{
private:
struct SomethingValueBase
{
virtual ~SomethingValueBase()
{
}
};
template <typename T_>
struct SomethingValue :
SomethingValueBase
{
T_ value;
SomethingValue(const T_ & v) :
value(v)
{
}
};
std::shared_ptr<SomethingValueBase> _value;
public:
template <typename T_>
Something(const T_ & t) :
_value(new SomethingValue<T_>(t))
{
}
template <typename T_>
const T_ & as() const
{
return dynamic_cast<const SomethingValue<T_> &>(*_value).value;
}
};
Now, if we get the types wrong, a std::bad_cast will be thrown. Alternatively, we can use our own exception type:
class SomethingIsSomethingElse
{
};
class Something
{
/* snip */
public:
template <typename T_>
const T_ & as() const
{
auto value_casted(dynamic_cast<const SomethingValue<T_> *>(_value.get()));
if (! value_casted)
throw SomethingIsSomethingElse();
return value_casted->value;
}
};
We can also make use of std::dynamic_pointer_cast, which is possibly slightly less ugly syntactically:
class Something
{
/* snip */
public:
template <typename T_>
const T_ & as() const
{
auto value_casted(std::dynamic_pointer_cast<const SomethingValue<T_> >(_value));
if (! value_casted)
throw SomethingIsSomethingElse();
return value_casted->value;
}
};
All of this is using RTTI, though, and RTTI is a huge amount of overkill for what we need. Before eliminating the RTTI, though, we’ll switch to using it in a different way:
#include <memory>
#include <string>
#include <typeinfo>
class Something
{
private:
template <typename T_>
struct SomethingValueType
{
virtual ~SomethingValueBase()
{
}
};
struct SomethingValueBase
{
std::string type_info_name;
SomethingValueBase(const std::string & t) :
type_info_name(t)
{
}
};
template <typename T_>
struct SomethingValue :
SomethingValueBase
{
T_ value;
SomethingValue(const T_ & v) :
SomethingValueBase(typeid(SomethingValueType<T_>()).name()),
value(v)
{
}
};
std::shared_ptr<SomethingValueBase> _value;
public:
template <typename T_>
Something(const T_ & t) :
_value(new SomethingValue<T_>(t))
{
}
template <typename T_>
const T_ & as() const
{
if (typeid(SomethingValueType<T_>()).name() != _value->type_info_name)
throw SomethingIsSomethingElse();
return std::static_pointer_cast<const SomethingValue<T_> >(_value)->value;
}
};
Here we make use of typeid explicitly, which is widely considered to be about on par with use of goto. However, it paves the way for our next step. Can we replace typeid(SomethingValueType<T_>()).name() with a different, non-evil expression? Let’s think about what properties the result of that expression must have:
- We must be able to store it, so it needs to be a regular type.
- We must be able to compare values of it, and be guaranteed true if and only if the two types used to create the value are the same, and false if and only if they are different. (Note that RTTI doesn’t even provide this guarantee.)
Let’s try this:
#include <memory>
#include <string>
class SomethingIsSomethingElse
{
};
template <typename T_>
struct SomethingTypeTraits;
class Something
{
private:
struct SomethingValueBase
{
int magic_number;
SomethingValueBase(const int m) :
magic_number(m)
{
}
virtual ~SomethingValueBase()
{
}
};
template <typename T_>
struct SomethingValue :
SomethingValueBase
{
T_ value;
SomethingValue(const T_ & v) :
SomethingValueBase(SomethingTypeTraits<T_>::magic_number),
value(v)
{
}
};
std::shared_ptr<SomethingValueBase> _value;
public:
template <typename T_>
Something(const T_ & t) :
_value(new SomethingValue<T_>(t))
{
}
template <typename T_>
const T_ & as() const
{
if (SomethingTypeTraits<T_>::magic_number != _value->magic_number)
throw SomethingIsSomethingElse();
return std::static_pointer_cast<const SomethingValue<T_> >(_value)->value;
}
};
Now, our library user has to provide specialisations of SomethingTypeTraits for every type they wish to use:
#include <string>
#include <iostream>
#include <vector>
template <>
struct SomethingTypeTraits<int>
{
enum { magic_number = 1 };
};
template <>
struct SomethingTypeTraits<std::string>
{
enum { magic_number = 2 };
};
int main(int, char *[])
{
std::vector<Something> things = { std::string("foo"), 123 };
std::cout << things[0].as<std::string>() << " " << things[1].as<int>() << std::endl;
}
No RTTI at all there, and it is type safe, but it relies upon a lot of boilerplate from the library user, and that boilerplate is very easy to screw up. So, we’ll allocate magic numbers automatically instead:
#include <memory>
class Something
{
private:
static int next_magic_number()
{
static int magic(0);
return magic++;
}
template <typename T_>
static int magic_number_for()
{
static int result(next_magic_number());
return result;
}
struct SomethingValueBase
{
int magic_number;
SomethingValueBase(const int m) :
magic_number(m)
{
}
virtual ~SomethingValueBase()
{
}
};
template <typename T_>
struct SomethingValue :
SomethingValueBase
{
T_ value;
SomethingValue(const T_ & v) :
SomethingValueBase(magic_number_for<T_>()),
value(v)
{
}
};
std::shared_ptr<SomethingValueBase> _value;
public:
template <typename T_>
Something(const T_ & t) :
_value(new SomethingValue<T_>(t))
{
}
template <typename T_>
const T_ & as() const
{
if (magic_number_for<T_>() != _value->magic_number)
throw SomethingIsSomethingElse();
return std::static_pointer_cast<const SomethingValue<T_> >(_value)->value;
}
};
How does this work? Each instantiation of the magic_number_for<T_> function needs to return the same magic number every time it is called. The first time any particular instantiation is called, its static int result requests the next magic number. On subsequent calls, the allocated number is remembered. (Note that static values inside a template are not shared between different instantiations of that template.) Finally, next_magic_number just returns a new magic number every time it is called.
And there we have it: fast runtime type checking with no boilerplate and no RTTI. What we’ve done here is more or less useless, but the techniques do have other applications. For the curious, std::use_facet<> is probably the most common, and anyone brave enough to delve into its design will eventually see why this isn’t either pointless wankery or reinventing the wheel. For the rest, if you think that using RTTI can solve your problem adequately, then it probably can, and you don’t need to go into the kind of devious trickery the standard library uses internally.
C++ Named Function Parameters
C++ doesn’t have named function parameters. In some ways this isn’t a huge deal, since the compiler will usually catch when you screw up the ordering of arguments to a function. But if you’ve got a function accepting multiple arguments of the same type, the compiler isn’t going to save you. So we want to allow something like following:
shop.populate(
param::number_of_cheeses() = 0,
param::number_of_parrots() = 1,
param::parrot_variety() = "Norwegian Blue"
);
We also want:
- As little as possible boilerplate from the programmer.
- Type safety. It shouldn’t compile if the arguments are wrong.
- Zero overhead.
It would be nice to allow arguments to be specified in any order, and there is a way of doing that using C++0x, but it’s rather convoluted, so we’ll stick with the requirement that arguments be in the right order for now.
First, we want to work out the type of those param::foo() things. Since we’re using operator=, they need to be structs or constants of some kind (since operator= can only be overloaded as a member function). Since we want lots of them of different types, and since we don’t want to have to worry about declaring the same name multiple times (which means we’d start hitting the ODR), a typedef of a template seems in order. Thus, we’d like to do:
namespace params
{
typedef Name</* something */> number_of_cheeses;
typedef Name</* something */> number_of_parrots;
typedef Name</* something */> parrot_variety;
}
As for the something, the best I’ve been able to come up with is an inline forward declaration of a meaningless struct:
namespace params
{
typedef Name<struct N_number_of_cheeses> number_of_cheeses;
typedef Name<struct N_number_of_parrots> number_of_parrots;
typedef Name<struct N_parrot_variety> parrot_variety;
}
What about the function parameters?
void Shop::populate(
const NamedValue<param::number_of_cheeses, int> & number_of_cheeses,
const NamedValue<param::number_of_parrots, int> & number_of_parrots,
const NamedValue<param::number_of_cheeses, std::string> & parrot_variety)
{
/* ... */
}
There’s a small amount of duplication there, but that’s a necessity: it’s considered a useful feature of C and C++ that declarations and implementations of functions can use different names for parameters.
As for using the parameters, we’ve got two options. We could add a super magic cast operator to NamedValue, or we could make it explicit. Since super magic casts have a nasty habit of doing really weird things, we’ll make it explicit using operator():
void Shop::populate(
const NamedValue<param::number_of_cheeses, int> & number_of_cheeses,
const NamedValue<param::number_of_parrots, int> & number_of_parrots,
const NamedValue<param::number_of_cheeses, std::string> & parrot_variety)
{
cheeses.resize(number_of_cheeses());
cage.insert(number_of_parrots(), parrot_variety());
}
Now we just have to make it work. First, NamedValue, remembering to provide const and non-const versions of our operator:
template <typename T_, typename V_>
class NamedValue
{
private:
V_ _value;
public:
explicit NamedValue(const V_ & v) :
_value(v)
{
}
V_ & operator() ()
{
return _value;
}
const V_ & operator() () const
{
return _value;
}
};
Then Name. Our first attempt might look like this:
template <typename T_>
struct Name
{
template <typename V_>
NamedValue<Name<T_>, V_> operator= (const V_ & v) const
{
return NamedValue<Name<T_>, V_>(v);
}
};
But there’s a problem: whilst this works for int and most classes, it does something immensely stupid when fed a string literal. We could require users to write out parameters like:
param::parrot_variety() = std::string("Norwegian Blue")
but that’s rather silly. So instead we’ll add in a way of overriding types for NamedValue, keeping it nice and generic in case any similar situations crop up elsewhere:
template <typename T_>
struct NamedValueType
{
typedef T_ Type;
};
template <int n>
struct NamedValueType<char [n]>
{
typedef std::string Type;
};
template <typename T_>
struct Name
{
template <typename V_>
NamedValue<Name<T_>, typename NamedValueType<V_>::Type> operator= (const V_ & v) const
{
return NamedValue<Name<T_>, typename NamedValueType<V_>::Type>(v);
}
};
Fortunately, g++ is smart enough to compile all of this into exactly the same code as it would if named parameters weren’t used.
And there we have it: very low boilerplate type safe named parameters with no icky macros.
C++ Explicit Template Instantiation Hate Redux
Today’s hatred of C++ is brought to you by the section [temp.explicit]:
A definition of a class template or class member template shall be in scope at the point of the explicit instantiation of the class template or class member template.
Unfortunately, it doesn’t “explicit instantiation definition” there, so you can’t do an explicit instantiation declaration when you only have a class declaration available. I can’t figure out what changing this would break, and whether it’s just an omission (explicit instantiation declarations are new in C++0x, but explicit instantiations are not) or a deliberate restriction.
Whilst we’re on the subject, not being able to use typedef names when explicitly instantiating is still a pain in the arse too, although the implications of allowing that are almost certainly moderately icky.
C++ Template Specialisation Hate
Today’s annoying C++ feature is that partial specialisations of a nested type of a template class don’t work:
template <typename T_> struct S; template <typename T_> struct T { struct U; }; template <typename T_> struct S<typename T<T_>::U> { };
Depending upon your compiler, the specialisation will either be rejected with a highly cryptic error message, or accepted but ignored. I don’t seem to be able to find the part of the standard that bans doing this, either, but that doesn’t necessarily mean it’s legal…
The solution, in any case, is to hoist the nested class out of the template, and use a typedef instead:
template <typename T_> struct S; template <typename T_> struct T_U; template <typename T_> struct T { typedef T_U<T_> U; }; template <typename T_> struct S<T_U<T_> > { };
I’ve been of the opinion that nested classes are generally far more pain than they’re worth for a while now (they also can’t be forward-declared); I’m highly tempted to just stop using them anywhere at all, and switch exclusively to using typedefs.
Assorted C++ Linkage
- A talk by Bjarne Stroustrup on the design of C++0x.
- Range based for loops in C++0x without concepts, good to know this hasn’t been dropped.
- Making
std::list::size()O(1) [PDF], a shame to see this hasn’t been dropped. - And on a less technical note, filtering compiler optimisation flags is not a solution from Diego. Unfortunately, he leaves out some of the worst offenders for flags that users think they can use that do change the meaning of programs:
-fvisibility-inlines-hiddenand-Wl,--as-needed.
C++ Const Curiosity
GCC will accept the following (look closely at S::f‘s signature in both places):
struct T { void foo() { } }; struct S { void f(const T); }; void S::f(T t) { t.foo(); } int main(int, char *[]) { T t; S s; s.f(t); }
The question is, should it?
Update: and the answer is, yes, it should, according to [dcl.fct] in the standard. This is both useful and annoying.