Data objects with boost::tuple
The data object. Every application’s object model has one or more of these things acting as a container for some stateful data, often in the form of record-level data sourced from outside the application. When approached top-down this usually requires gobs of boring code to implement that are redundant, a pain to maintain, and a downright chore to test.
Over time, with the addition of new data members we usually see new getter/setter pairs added, tweaks to print functions, and maybe manual edits to some critical serialization logic. Aside from being risky if there’s not adequate testing infrastructure in place, this is not the best use of your time — you could be writing far more important business and application logic. All in all, maintaining data objects is alot of work for something that should be braindead simple and done auto-magically.
Here’s an approach for providing flexible data objects built on boost::tuple that are geared toward rapid development, ease of use, and extensibility.
Most data objects are used as containers for record-level data [1] that they usually require, at a minimum, the following:
1. Getters
2. Setters
3. A print function
In some cases, like applications with critical recoverable state (e.g. a trading engine), it may also be necessary to expose hooks for things like:
1. Serialization
2. Deserialization
3. Versioning
For the purposes of example, let’s ignore how to implement these advanced features and recognize only that we must provide an easily accessible way to implement features such as these.
Requirements for a Flexible Data Container Object
Now that we’ve got a good idea of the data-struct requirements, let’s scope our problem with some objective technical requirements. Whatever we decide for our C++ data structs must:
1. Work with all C++ types (even ones with angle-braces in their names)
2. Be fast (specifically, they should not be materially slower than a hand-crafted struct at runtime)
3. Offer const- [2] and volatile- correct data access
4. Expose semantics that are typed, and absent of any clumsy casting
5. Provide named access to members similar to that of a custom struct [3]
6. Not require shotgun-surgery [4] for changes (specifically, there should be a simple model for adding and removing fields that is unambiguous that requires minimal changes to code)
7. Permit data hiding and other non-simple data operations
Down to Business
I’ve seen many varying approaches to this problem. Because everyone has their own style, the application domain they work in vary widely, and the fact that not everyone even thinks that a data object is ever a good idea, it’s hard to objectively navigate this space. I’m just going to skip this discussion entirely and go straight to the approach.
Don’t Even Think Of Using Macros
One oft suggested approach is use of macro metaprograms that expand into custom structs during pre-processing. This can work, but is very fragile when we consider the rich C++ typing system and the fact that we require support for any C++ type, including template types (written with angle-braces). Recall, macro-processing is textual and lacks the C++ context. As such, its inability to play nice with angle-braces rules it out from the start. (Yes, I am aware you can work around this with typedefs but I don’t think you should have to do this).
Even if you don’t think this is a big deal, consider that your likely to see a trail of compiler errors a few hundred errors deep, all referencing a single macro-expanded line of source if things go wrong with your macro in a way you didn’t expect. Since one of our design goals is ease of use and genericity, it’s easiest to just avoid them. They’re simply not suitable for use here.
Enter Boost Tuple
But, boost::tuple [5] is (if you’re not using boost and you’re writing modern C++, you have bigger problems to deal with than unmanageable data objects). It allows us to statically declare a sequence of heterogenous types to be bound in a single object:
typedef boost::tuple
<
std::string,
long,
int
> FieldTuple;
It also provides templated getters and setters (n.b. the setters are actually non-const getters) for arbitrary and generic data access:
FieldTuple myFieldTuple;
myFieldTuple.get<2>() = 42;
int someInt = myFieldTuple.get<2>();
And boost::tuple even has customizable streaming support for printing to generic streams:
std::cout << boost::tuples::set_open('[')
<< boost::tuples::set_delimiter(',')
<< boost::tuples::set_close(']')
<< myFieldTuple
<< std::endl;
This is great. At this point you should be convinced that boost::tuple is awesome, and just what we needed. Template magic ensures that the object returned is typed and checked at compile time, we get tons of flexibility in how we print these objects, and we’ve satisfied requirements 1-4 without writing any code. Note that we have some work ahead of us if we’re going to adequately satisfy requirement 7, permitting data-hiding and other non-simple operations, since tuples are an open-book as far as access goes.
For now, let’s first tackle requirements 5, named access, and 6, no shotgun-surgery.
Named Member Access
Named access is pivotal for the maintainability and usefulness of your data objects. If you had to reference each member by index you’d never be able to change the underlying tuple without doing all sorts of nasty shotgun-surgery to update indices in code using it. And don’t forget, the compiler may not even catch the index switch if the types to which the indices refer don’t change.
So, for named access to members we get clever. We know that the template get() function requires an integer template parameter to specify which object (and type) in the tuple to return. In our case, since these are 0-based indices of a boost::tuple and they should generally start at 0 and grow by 1 at each step. The natural choice is to use an enum:
enum
{
ConnectionName = 0,
ConnectionTimeout,
ConnectionRetries,
} FieldName;
So, we’d access the fields by name as:
FieldTuple myFieldTuple;
myFieldTuple.get () = 42;
int timeout = myFieldTuple.get ();
The subtle part about this approach, and a requirement that isn’t immediately apparent, is that we’re allowed to alias the same data fields with multiple names (just like we could in a struct) if ever needed:
enum
{
ConnectionName = 0,
ConnectionTimeout,
ConnectionRetries,
ConnectionMaxRetries = ConnectionRetries,
} FieldName;
So that solves requirement 5. On to requirement 6.
Avoiding Shotgun Surgery
Satisfying requirement 6 boils down to packaging. At this point it should be pretty evident that we could declare the tuple and the enum inline in business-logic code and avoid a separate data-object class altogether. This is bad for two reasons:
1. When declared inline this object definition may not be reused elsewhere
2. The FieldName enum pollutes the namespace of the enclosing scope
The inability to reuse the data object might not be an immediate concern if reusability isn’t a key design factor or the data object isn’t in an exposed header. For object-model cleanliness and avoiding future work, the data object should probably be wrapped up into it’s own class (nested in the business logic class if you like). This way it’s possible to cleanly expose the object in your API layer without refactoring all the gobs of code that curently reference it.
The namespace pollution issue is far more troubling. We get around it here by pre-fixing all the field names with some kind of relevant descriptive qualifier. This is clumsy, requires alot of typing, and isn’t very modern C++. We have namespaces and scopes, we should use them. The scoping of the field names doesn’t reflect the object to which these attributes are bound.
So let’s fix these issues and wrap everything up into a class so all changes can be made in one place:
class Connection
{
public:
{
Name = 0,
Timeout,
MaxRetries
} FieldName;
<
std::string,
long,
int
> FieldTuple;
private:
An astute reader will have noticed that while this is clean it introduces what seems to be a very big problem: we cant call into or otherwise access the underlying boost::tuple member.
This is not a problem, in fact it’s a feature. It’s where the magic starts to happen.
Flexibility Through Composition
Requirement 7 stipulated that we have a simple way to provide support for non-simple operations like printing, serialization, and other arbitrary complex behavior. Composition is our answer.
So far we have boost::tuple tooling at our disposal for the simple stuff like printing and typed member access. We also have a class with it’s own scope where we’re free to implement any other functionality we need. So our job at this point boils down to deciding which members/functionality to expose and what bolt-ons to add (e.g. persistence).
Boost::Tuple Revisited
The first thing we need to do is restore the boost::tuple functionality we were so excited about earlier in this new scope. This is very simple to achieve by a template forwarding function:
template
const typename boost::tuples::element::type& get() const
{
return _impl.template get ();
}
This function forwards calls to get() down to the underlying boost::tuple and returns a const-ref to a correctly typed (including const and volatile correct) tuple element. The non-const variant is just as easy to implement, just remove the const keywords. (Note that the template keyword is required here because we’re calling a template member function from a template context.)
Other Functionality, Like Persistence
Implementing any other functionality can be done exactly as it would be done in a hand-crafted struct through member (or possibly utility) functions. In all cases we retain complete control over the visibility of any implemented functionality since we’re free to make the visibility of these member functions (including the forwarding template ones) private, protected, or public.
Alas, nirvana has been achieved. We’ve abstracted out the data access semantics so that we can focus on higher-level application data functionality.
Advanced Design Notes
Our data class serves as an interception point for all access into the boost::tuple and is a control point where we can decorate or otherwise replace the boost::tuple behavior altogether. There are some very nifty things we can do here, so let’s explore a few. Since we’re working with templates we can use specialization and the usual bag of meta-programming tricks.
Hiding a Member
Consider the case where we wanted to stop someone from accessing a data member. In our connection example, let’s say that we don’t want anyone reading the timeout. We know that the only time a template member function is instantiated is when it’s used, so we could use a boost static assertion [6] to easily achieve this:
template
const boost::tuples::element get() const
{
BOOST_STATIC_ASSERT(fnv != Timeout);
return FieldTupleGetInvoker::get (_impl);
}
Any code trying to read this data member will result in an immediate compiler error. it doesn’t get much better than this.
Custom Member
Or does it? To achieve even more granular levels of specialized behavior we can use template specialization. Since member function templates may not be partially specialized and they may not be (completely) specialized inside a template class that itself is not specialized we must call through an invoker class to implement partial specialization semantics.
I’ll leave thinking through this as an exercise for the reader.
References
[1] Replace Record with Data Class, from Fowler’s Refactoring
[2] GOTW #6, Const-Correctness
[3] Explaining Variable, from Fowler’s Refactoring
[4] Shotgun Surgery Anti-Pattern, Wikipedia
[5] boost::tuple
[6] Boost.StaticAssert
[7] GOTW #17, Read through for template specialization rules
Posting your comment
Will | October 12, 2008 @ 6:50 pm
Nice intro to Boost::Tuple! I’m sold. BTW, I noticed some angle brace carnage here and there in the article.