Thursday, May 16, 2013

My take on C++ serialization (Part I: get_size)

Long time, no see!

Lately I have been busy writing... and writing... and we know that: "All writing and no coding makes Simone a dull boy" :) so I needed to go back to C++ writing... which means new blog entries!

Lately, in a project (libwater, soon released) I have been using boost::serialization in order to transfer C++ objects from one node to another via network. Boost serialization is pretty neat, especially when you are under a deadline, but releasing the code with a dependency on boost is in most of the case an overkill.  Especially when the boost library you are using is not an "header-only" library, which means the user needs it installed in his system. Since the project focuses on large clusters, this is always a big problem because these advanced libraries are in the majority of the cases missing (or a very old version is installed) meaning that the user has to do lot of installation work just to try out our library.

Beside that, boost::serialization was doing more than we actually needed, in fact I am in the particular case where I exactly know the type of the object I am receiving on the remote node, therefore I don't need to store in the serialization stream the type information like boost does. This is typical when for example you are implementing a network protocol where the structure of messages is known. In such case we can assign a tag to each message type and easily assign a C++ type to it. Since the list of messages is known statically, we can use some meta-programming just to have some fun.

Actually I could literally half the size of each transferred message by only storing the object content. Therefore I started to work on a "header-only"  serialization interface... of course it had to involve template meta-programming! :) I am going to post the various pieces of my take on object serialization on multiple posts (otherwise it may get to heavy to read).

The first thing (which I will introduce today) was defining a functor which given an object returns the size (in bytes) of its serialized representation, I called it size_t get_size(const T& obj). This is some easy code which needs to be written, so let's get to it.

Stated that the types I deal with are (for now) restricted to std::string, integral types (uint8_t, uint16_t, uint32_t, uint64_t), std::vector<T> and std::tuple<T...>, this code should make the trick. While for tuples and integral types we directly store the value, for vectors and string we prepend the number of elements which will be stored (since this information is not explicit in the type). 

For tuples I I hope I had a better solution using template expansion. This could be possible if for example the get_size was more like sizeof... which means it can be computed based on the object type . This would allow me to write something like this: std::sum(sizeof(T)...).

However, since we may have strings and vectors as elements of the tuple, this is not possible because we would need to also unpack the elements of the tuple (not only the type). Actually I would appreciated anyone that could give me a better solution for that. 

A simple test:

C++ <3