Saturday, May 18, 2013

My take on serialization (Part III: deserialize)

After seeing serialization, one fundamental thing is missing, i.e. deserialization!

There is not much to say here, this is the inverse operation we performed during serialization... therefore similar patters apply.

And we are done! 

In the std::tuple<T...> deserializer I would have liked to use the commented line. With that line I could remove 20+ lines of code which are used by the deserialize_tuple method. However, in that way the object is deserialized in the inverse order. The type is correct, but since it seems that the arguments of the make_tuple function are evaluated right-to-left, the resulting elements of the tuple are inverted. Therefore a serialized tuple (1,2,3) is deserialized back as (3,2,1) :(. This is caused by the fact that the apply function has side-effects and in C++ we cannot rely on the evaluation order of the arguments of a function, therefore this code is not safe and better take the safe solution (however it might work in some C++ compiler). 

Just to see if everything is working fine we write our usual test cases using google-test: 

Last thing to do is running a complete benchmark where we serialize and deserialize an object and compare it with boost::serialization. This time I compiled everything with optimizations enabled (-O3) and I am using gcc 4.8.0 20130502 (pre-release).

The code is similar to the one we saw in the previous post this time I add a call to deserialize and a stupid if to be sure the compiler is not doing any dead-code elimination. The code for boost::serialization is similar, just trust me (I know I am Italian... it might be difficult... but come on, as my supervisor says... "give me a break").

The result is well... quite impressive. I didn't do this exercise with performance in mind, rather than my goal was to eliminate a dependency on the boost libraries. Now I realize that boost is definitely doing something really wrong in the serialization library. The added storing of typing info does not justify the huge performance penalty. My solution is 20x faster! Since the messages I produce are half of the size (thanks to the missing typing info) I would expect boost to be twice as slow.

I am frankly quite pleased by the performance improvements I saw within the libWater project after replacing boost::serialization with this solution. We had a 10% performance improvement which in HPC is quite welcome.

The full code, plus the test cases are available on github (under the BSD license):
(contributions are welcome)

Read: PART I: get_size(...)

C++ <3