Saturday, May 18, 2013

My take on serialization (Part III: deserialize)

After seeing serialization, one fundamental thing is missing, i.e. deserialization!

There is not much to say here, this is the inverse operation we performed during serialization... therefore similar patters apply.
namespace detail {
template <class T>
struct deserialize_helper;
template <class T>
struct deserialize_helper {
/**
* Deserialization for integral types and POD data types.
*
* This is done by simply relying on the sizeof() operator of the object.
* It is important that the datatype we are deserializing has a default
* contructor, otherwise you need to provide a specialization for that type
*/
static T apply(StreamType::const_iterator& begin, StreamType::const_iterator end) {
assert(begin+sizeof(T)<=end && "Error: not enough bytes to deserialize type");
T val;
uint8_t* ptr = reinterpret_cast<uint8_t*>(&val);
std::copy(begin, begin+sizeof(T), ptr);
begin+=sizeof(T);
return val;
}
};
template <class T>
struct deserialize_helper<std::vector<T>> {
/**
* Deserialization for vector types.
*
* We read the first size_t value which contains the number of elements
* and then recursively read each of them. An optimization can be done
* for which we avoid recursion in the case we have a vector of PODs or
* integral type (soon to come)
*/
static std::vector<T> apply(StreamType::const_iterator& begin,
StreamType::const_iterator end)
{
// retrieve the number of elements
size_t size = deserialize_helper<size_t>::apply(begin,end);
std::vector<T> vect(size);
for(size_t i=0; i<size; ++i) {
/**
* Call the move-copy constructor so that the additional copy
* is avoided
*/
vect[i] = std::move(deserialize_helper<T>::apply(begin,end));
}
return vect;
}
};
template <>
struct deserialize_helper<std::string> {
/**
* Deserialization for strings.
*
* similar to vectors but we can avoid recursion since the elements
* of a string are always bytes.
*/
static std::string apply(StreamType::const_iterator& begin,
StreamType::const_iterator end)
{
// retrieve the number of elements
size_t size = deserialize_helper<size_t>::apply(begin,end);
// We need to consider the case of empty strings separately
if (size == 0u) return std::string();
std::string str(size,'\0');
for(size_t i=0; i<size; ++i) {
str.at(i) = deserialize_helper<uint8_t>::apply(begin,end);
}
return str;
}
};
template <class tuple_type>
inline void deserialize_tuple(tuple_type& obj, StreamType::const_iterator& begin,
StreamType::const_iterator end, int_<0>) {
constexpr size_t idx = std::tuple_size<tuple_type>::value-1;
typedef typename std::tuple_element<idx,tuple_type>::type T;
// Use std::move() to force R-value copy constructor and avoid the copy
std::get<idx>(obj) = std::move(deserialize_helper<T>::apply(begin, end));
}
template <class tuple_type, size_t pos>
inline void deserialize_tuple(tuple_type& obj, StreamType::const_iterator& begin,
StreamType::const_iterator end, int_<pos>) {
constexpr size_t idx = std::tuple_size<tuple_type>::value-pos-1;
typedef typename std::tuple_element<idx,tuple_type>::type T;
// Use std::move() to force R-value copy constructor and avoid the copy
std::get<idx>(obj) = std::move(deserialize_helper<T>::apply(begin, end));
// meta-recur
deserialize_tuple(obj, begin, end, int_<pos-1>());
}
template <class... T>
struct deserialize_helper<std::tuple<T...>> {
/**
* Deserialization for tuples
*
* same pattern as for serialization
*/
static std::tuple<T...> apply(StreamType::const_iterator& begin,StreamType::const_iterator end)
{
// if only this worked... sigh!
// return std::make_tuple(deserialize_helper<T>::apply(begin,end)...);
std::tuple<T...> ret;
deserialize_tuple(ret, begin, end, int_<sizeof...(T)-1>());
return ret;
}
};
} // end detail namespace
template <class T>
inline T deserialize(StreamType::const_iterator& begin, const StreamType::const_iterator& end) {
return detail::deserialize_helper<T>::apply(begin, end);
}
template <class T>
inline T deserialize(const StreamType& res) {
StreamType::const_iterator it = res.cbegin();
return deserialize(it, res.cend());
}
view raw gistfile1.cpp hosted with ❤ by GitHub

And we are done! 

In the std::tuple<T...> deserializer I would have liked to use the commented line. With that line I could remove 20+ lines of code which are used by the deserialize_tuple method. However, in that way the object is deserialized in the inverse order. The type is correct, but since it seems that the arguments of the make_tuple function are evaluated right-to-left, the resulting elements of the tuple are inverted. Therefore a serialized tuple (1,2,3) is deserialized back as (3,2,1) :(. This is caused by the fact that the apply function has side-effects and in C++ we cannot rely on the evaluation order of the arguments of a function, therefore this code is not safe and better take the safe solution (however it might work in some C++ compiler). 

Just to see if everything is working fine we write our usual test cases using google-test: 

TEST(Deserialize, Ints) {
int v1 = deserialize<uint32_t>({0xA, 0, 0, 0});
EXPECT_EQ(v1, 10);
auto v2 = deserialize<uint64_t>({0x40, 0, 0, 0, 0, 0, 0, 0});
EXPECT_EQ(v2, 64u);
auto v3 = deserialize<int>({0xFF, 0xFF, 0xFF, 0xFF});
EXPECT_EQ(v3, -1);
}
TEST(Deserialize, Vector) {
auto v1 = deserialize<std::vector<int>>({2,0,0,0,0,0,0,0,1,0,0,0,2,0,0,0});
EXPECT_EQ(v1, std::vector<int>({1,2}));
auto v2 = deserialize<std::vector<std::vector<uint8_t>>>(
{2, 0, 0, 0, 0, 0, 0, 0, /* size */
2, 0, 0, 0, 0, 0, 0, 0, /* size */ 1, 2,
2, 0, 0, 0, 0, 0, 0, 0, /* size */ 3, 4
});
EXPECT_EQ(v2, std::vector<std::vector<uint8_t>>({{1,2},{3,4}}));
}
TEST(Deserialize, IntTuple) {
auto t1 = deserialize<std::tuple<int,int>>({1, 0, 0, 0, 2, 0, 0, 0});
EXPECT_EQ(t1, std::make_tuple(1,2));
auto t2 = deserialize<std::tuple<int,int,char>>({1, 0, 0, 0, 2, 0, 0, 0, 3});
EXPECT_EQ(t2, std::make_tuple(1,2,3));
}
TEST(Deserialize, TupleVec) {
auto t = deserialize<std::tuple<int,int,std::vector<uint8_t>>>(
{
10, 0, 0, 0, /* get<0> */
20, 0, 0, 0, /* get<1> */
2, 0, 0, 0, 0, /* size */ 0, 0, 0, 1, 2 /* get<2> */
});
EXPECT_EQ(t, std::make_tuple(10,20,std::vector<uint8_t>({1,2})));
}
view raw gistfile1.cpp hosted with ❤ by GitHub
Last thing to do is running a complete benchmark where we serialize and deserialize an object and compare it with boost::serialization. This time I compiled everything with optimizations enabled (-O3) and I am using gcc 4.8.0 20130502 (pre-release).

TEST(Performance, Water) {
auto t1 = std::tuple<int,uint64_t,std::vector<uint8_t>,std::string>(
10, 20, std::vector<uint8_t>{0,1,2,3,4,5,6,7,8,9},"hello cpp-love!");
auto start = std::chrono::high_resolution_clock::now();
for (size_t i=0; i<500000; ++i) {
StreamType res;
serialize(t1,res);
auto t2 = deserialize<decltype(t1)>(res);
if (t1!=t2) { std::cerr << "BIG PROBLEMS!!!.. call MUM!" << std::endl; }
}
auto tot_time = std::chrono::duration_cast<std::chrono::microseconds>(
std::chrono::system_clock::now()-start).count();
std::cout << "time: " << tot_time << std::endl;
}
view raw gistfile1.cpp hosted with ❤ by GitHub

The code is similar to the one we saw in the previous post this time I add a call to deserialize and a stupid if to be sure the compiler is not doing any dead-code elimination. The code for boost::serialization is similar, just trust me (I know I am Italian... it might be difficult... but come on, as my supervisor says... "give me a break").

The result is well... quite impressive. I didn't do this exercise with performance in mind, rather than my goal was to eliminate a dependency on the boost libraries. Now I realize that boost is definitely doing something really wrong in the serialization library. The added storing of typing info does not justify the huge performance penalty. My solution is 20x faster! Since the messages I produce are half of the size (thanks to the missing typing info) I would expect boost to be twice as slow.

I am frankly quite pleased by the performance improvements I saw within the libWater project after replacing boost::serialization with this solution. We had a 10% performance improvement which in HPC is quite welcome.

The full code, plus the test cases are available on github (under the BSD license): https://github.com/motonacciu/meta-serialization
(contributions are welcome)

Read: PART I: get_size(...)


C++ <3

Friday, May 17, 2013

My take on serialization (Part II: serialize)

After discussed the get_size(...) functor, which given an object returns its serialized size in bytes, we can go on and write the serialize function.

We can follow the same pattern of the get_size, but this time we have to store the content to a stream.
namespace detail {
template <class T>
class serialize_helper;
template <class T>
void serializer(const T& obj, StreamType::iterator&);
template <class tuple_type>
inline void serialize_tuple(const tuple_type& obj, StreamType::iterator& res, int_<0>) {
constexpr size_t idx = std::tuple_size<tuple_type>::value-1;
serializer(std::get<idx>(obj), res);
}
template <class tuple_type, size_t pos>
inline void serialize_tuple(const tuple_type& obj, StreamType::iterator& res, int_<pos>) {
constexpr size_t idx = std::tuple_size<tuple_type>::value-pos-1;
serializer(std::get<idx>(obj), res);
// recur
serialize_tuple(obj, res, int_<pos-1>());
}
template <class... T>
struct serialize_helper<std::tuple<T...>> {
static void apply(const std::tuple<T...>& obj, StreamType::iterator& res) {
detail::serialize_tuple(obj, res, detail::int_<sizeof...(T)-1>());
}
};
template <>
struct serialize_helper<std::string> {
static void apply(const std::string& obj, StreamType::iterator& res) {
// store the number of elements of this string at the beginning
serializer(obj.length(), res);
for(const auto& cur : obj) { serializer(cur, res); }
}
};
template <class T>
struct serialize_helper<std::vector<T>> {
static void apply(const std::vector<T>& obj, StreamType::iterator& res) {
// store the number of elements of this vector at the beginning
serializer(obj.size(), res);
for(const auto& cur : obj) { serializer(cur, res); }
}
};
template <class T>
struct serialize_helper {
static void apply(const T& obj, StreamType::iterator& res) {
const uint8_t* ptr = reinterpret_cast<const uint8_t*>(&obj);
std::copy(ptr,ptr+sizeof(T),res);
res+=sizeof(T);
}
};
template <class T>
inline void serializer(const T& obj, StreamType::iterator& res) {
serialize_helper<T>::apply(obj,res);
}
} // end detail namespace
template <class T>
inline void serialize(const T& obj, StreamType& res) {
size_t offset = res.size();
size_t size = get_size(obj);
res.resize(res.size() + size);
StreamType::iterator it = res.begin() + offset;
detail::serializer(obj,it);
assert(res.begin() + offset + size == it);
}
view raw gistfile1.cpp hosted with ❤ by GitHub

As before, we have a specialization of the serialize_helper template for tuples, vectors, strings and POD datatypes. In order to be performance efficient we presize the vector in lines 66-67 (because we know how many bytes we need to store the object) thanks to the get_size predicate.

Let's write some unit tests (because they are very important):
TEST(Serialize, Ints) {
uint32_t v1 = 10;
StreamType res;
serialize(v1,res);
EXPECT_EQ(sizeof(uint32_t), res.size());
EXPECT_EQ(res, std::vector<uint8_t>({0xA, 0, 0, 0}));
res.clear();
uint64_t v2 = 64;
serialize(v2,res);
EXPECT_EQ(sizeof(uint64_t), res.size());
EXPECT_EQ(res, std::vector<uint8_t>({0x40, 0, 0, 0, 0, 0, 0, 0}));
res.clear();
int v3 = -1;
serialize(v3,res);
EXPECT_EQ(sizeof(int), res.size());
EXPECT_EQ(res, std::vector<uint8_t>({0xFF, 0xFF, 0xFF, 0xFF}));
}
TEST(Serialize, Vector) {
auto t1 = std::vector<int>{1,2};
StreamType res;
serialize(t1,res);
EXPECT_EQ(sizeof(decltype(t1)::value_type)*t1.size()+sizeof(size_t), res.size());
EXPECT_EQ(res, std::vector<uint8_t>({/*size(*/2, 0, 0, 0, 0, 0, 0, 0,/*)*/ 1, 0, 0, 0, 2, 0, 0, 0}));
res.clear();
auto t2 = std::vector<std::vector<uint8_t>>{{1,2}, {3,4}};
serialize(t2,res);
EXPECT_EQ(get_size(t2), res.size());
EXPECT_EQ(28u, res.size());
EXPECT_EQ(res, std::vector<uint8_t>(
{/*size(*/2, 0, 0, 0, 0, 0, 0, 0, /*) size(*/ 2, 0, 0, 0, 0, 0, 0, 0, /*)*/ 1, 2,
/*size(*/2, 0, 0, 0, 0, 0, 0, 0, /*)*/ 3, 4 }));
}
TEST(Serialize, IntTuple) {
auto t1 = std::make_tuple(1,2);
StreamType res;
serialize(t1,res);
EXPECT_EQ(sizeof(decltype(t1)), res.size());
EXPECT_EQ(res, std::vector<uint8_t>({1, 0, 0, 0, 2, 0, 0, 0}));
res.clear();
auto t2 = std::make_tuple(256,256*2,256*3);
serialize(t2,res);
EXPECT_EQ(sizeof(decltype(t2)), res.size());
EXPECT_EQ(res, std::vector<uint8_t>({0, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0}));
res.clear();
auto t3 = std::tuple<boost::uint32_t, boost::uint64_t>(0,1);
serialize(t3,res);
EXPECT_EQ(get_size(t3), res.size());
EXPECT_EQ(res, std::vector<uint8_t>({0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0}));
}
TEST(Serialize, TupleVec) {
auto t1 = std::tuple<int,int,std::vector<uint8_t>>(10, 20, std::vector<uint8_t>{1,2});
StreamType res;
serialize(t1,res);
EXPECT_EQ(18u, res.size());
EXPECT_EQ(res, std::vector<uint8_t>({
/*get<0>*/ 10, 0, 0, 0,
/*get<1>*/ 20, 0, 0, 0,
/*size(*/ 2, 0, 0, 0, 0, 0, 0, 0,/*)*/ 1, 2}));
}
TEST(Serialize, String) {
std::string s = "string";
StreamType res;
serialize(s,res);
EXPECT_EQ(14u, get_size(s));
EXPECT_EQ(14u, res.size());
EXPECT_EQ(res, std::vector<uint8_t>({/*size(*/6, 0, 0, 0, 0, 0, 0, 0,/*)*/ 's', 't', 'r', 'i', 'n', 'g'}));
}
view raw gistfile1.cpp hosted with ❤ by GitHub


So it's time for running some benchmarks. YAY! Let's compare this serializer with boost and check whether spending time re implementing this was worth some. Since I don't have much time, we only run 1 experiment... if you are interested you can run more :) 

std::tuple<int,int,std::vector<uint8_t>> t(10, 20, {0,1,2,3,4,5,6,7,8,9});
TEST(Performance, Water) {
auto start = std::chrono::high_resolution_clock::now();
for (size_t i=0; i<500000; ++i) {
StreamType res;
serialize(t,res);
}
auto tot_time = std::chrono::duration_cast<std::chrono::microseconds>(
std::chrono::system_clock::now()-start).count();
std::cout << "time: " << tot_time << std::endl;
}
/* boost serializer */
template <class T>
inline std::string to_bytes(const T &v) {
std::ostringstream ss;
boost::archive::text_oarchive oa(ss);
oa << v;
return ss.str();
}
TEST(Performance, Boost) {
auto start = std::chrono::high_resolution_clock::now();
StreamType res;
for (size_t i=0; i<500000; ++i) { to_bytes(t1); }
auto tot_time = std::chrono::duration_cast<std::chrono::microseconds>(
std::chrono::system_clock::now()-start).count();
std::cout << "time: " << tot_time << std::endl;
}
view raw gistfile1.cpp hosted with ❤ by GitHub

So, as long as size of the serialized stream is concerned, boost uses 63 bytes while ours only 26 bytes (more than half). Performance-wise boost::serialization needs 4.410 secs to run the test, our serialization solution 0.975 seconds, more than 4 times faster. Of course we are not saying that boost::serialization sucks... as a matter of fact it solves a different problem since it also store the typing information which allows everyone (even a Java program) to reload the stream. Our serialization strategy (as I explained in the first post) is based on the fact that the type is known at receiver side (therefore we can avoid to store typing).

Let's show some graphs, comparison of our serialization strategy relative to boost::serialization:

Thursday, May 16, 2013

My take on C++ serialization (Part I: get_size)

Long time, no see!

Lately I have been busy writing... and writing... and we know that: "All writing and no coding makes Simone a dull boy" :) so I needed to go back to C++ writing... which means new blog entries!

Lately, in a project (libwater, soon released) I have been using boost::serialization in order to transfer C++ objects from one node to another via network. Boost serialization is pretty neat, especially when you are under a deadline, but releasing the code with a dependency on boost is in most of the case an overkill.  Especially when the boost library you are using is not an "header-only" library, which means the user needs it installed in his system. Since the project focuses on large clusters, this is always a big problem because these advanced libraries are in the majority of the cases missing (or a very old version is installed) meaning that the user has to do lot of installation work just to try out our library.

Beside that, boost::serialization was doing more than we actually needed, in fact I am in the particular case where I exactly know the type of the object I am receiving on the remote node, therefore I don't need to store in the serialization stream the type information like boost does. This is typical when for example you are implementing a network protocol where the structure of messages is known. In such case we can assign a tag to each message type and easily assign a C++ type to it. Since the list of messages is known statically, we can use some meta-programming just to have some fun.

Actually I could literally half the size of each transferred message by only storing the object content. Therefore I started to work on a "header-only"  serialization interface... of course it had to involve template meta-programming! :) I am going to post the various pieces of my take on object serialization on multiple posts (otherwise it may get to heavy to read).

The first thing (which I will introduce today) was defining a functor which given an object returns the size (in bytes) of its serialized representation, I called it size_t get_size(const T& obj). This is some easy code which needs to be written, so let's get to it.

#include <tuple>
#include <numeric>
#include <vector>
#include <string>
// forward declaration of get_size() method
template <class T>
size_t get_size(const T& obj);
namespace detail {
template <class T>
struct get_size_helper;
/**
* Specialization for generic std::vector<T>, a vector is represented
* in the stream by storing the number of elements (v.size()) followed
* by the serialized elements (notice that we can have nested structures
*/
template <class T>
struct get_size_helper<std::vector<T>> {
static size_t value(const std::vector<T>& obj) {
return std::accumulate(obj.begin(), obj.end(), sizeof(size_t),
[](const size_t& acc, const T& cur) { return acc+get_size(cur); });
}
};
/**
* Specialization for an std::string. Similarly to the vector, we store
* the number of elements of the string (str.length()), followed by the
* chars. In this case we are sure the elements of the string are bytes
* therefore we can compute the size without recurring
*/
template <>
struct get_size_helper<std::string> {
static size_t value(const std::string& obj) {
return sizeof(size_t) + obj.length()*sizeof(uint8_t);
}
};
template <class tuple_type>
inline size_t get_tuple_size(const tuple_type& obj, int_<0>) {
constexpr size_t idx = std::tuple_size<tuple_type>::value-1;
return get_size(std::get<idx>(obj));
}
template <class tuple_type, size_t pos>
inline size_t get_tuple_size(const tuple_type& obj, int_<pos>) {
constexpr size_t idx = std::tuple_size<tuple_type>::value-pos-1;
size_t acc = get_size(std::get<idx>(obj));
return acc+get_tuple_size(obj, int_<pos-1>());
}
/**
* specialization for std::tuple<T...>. In this case we don't need to
* store the number of elements since this information is explicit
* with the type itself. Therefore we simply serialize the elements
* of the tuple
*/
template <class ...T>
struct get_size_helper<std::tuple<T...>> {
static size_t value(const std::tuple<T...>& obj) {
return get_tuple_size(obj, int_<sizeof...(T)-1>());
}
};
/**
* Specialization for any remaining type, simply use the value of
* sizeof()
*/
template <class T>
struct get_size_helper {
static size_t value(const T& obj) { return sizeof(T); }
};
} // end detail namespace
template <class T>
inline size_t get_size(const T& obj) {
return detail::get_size_helper<T>::value(obj);
}
view raw gistfile1.cpp hosted with ❤ by GitHub
Stated that the types I deal with are (for now) restricted to std::string, integral types (uint8_t, uint16_t, uint32_t, uint64_t), std::vector<T> and std::tuple<T...>, this code should make the trick. While for tuples and integral types we directly store the value, for vectors and string we prepend the number of elements which will be stored (since this information is not explicit in the type). 

For tuples I I hope I had a better solution using template expansion. This could be possible if for example the get_size was more like sizeof... which means it can be computed based on the object type . This would allow me to write something like this: std::sum(sizeof(T)...).

However, since we may have strings and vectors as elements of the tuple, this is not possible because we would need to also unpack the elements of the tuple (not only the type). Actually I would appreciated anyone that could give me a better solution for that. 

A simple test:
#include <gtest/gtest.h>
#include "utils/serialize.h"
TEST(get_size, Ints) {
uint16_t v1 = 2u;
EXPECT_EQ(sizeof(decltype(v1)), get_size(v1));
EXPECT_EQ(2u, get_size(v1));
uint32_t v2 = 10u;
EXPECT_EQ(sizeof(decltype(v2)), get_size(v2));
EXPECT_EQ(4u, get_size(v2));
}
TEST(get_size, Vector) {
std::vector<uint8_t> v1{ 0u, 1u, 2u };
EXPECT_EQ(sizeof(size_t)+v1.size(), get_size(v1));
std::vector<uint64_t> v2{ 0u, 8u };
EXPECT_EQ(sizeof(size_t)+v2.size()*sizeof(uint64_t), get_size(v2));
}
TEST(get_size, String) {
std::string s1 = "hello";
EXPECT_EQ(sizeof(size_t)+s1.length(), get_size(s1));
std::string s2;
EXPECT_EQ(sizeof(size_t), get_size(s2));
}
TEST(get_size, Tuple) {
auto t1 = std::make_tuple(0ul, std::string("hi"));
EXPECT_EQ(sizeof(unsigned long)+sizeof(size_t)+2u, get_size(t1));
auto t2 = std::make_tuple(std::vector<uint16_t>{'a','b','c'}, 10, 20);
EXPECT_EQ(sizeof(size_t)+3*2+sizeof(int)*2, get_size(s2));
}
view raw gistfile1.cpp hosted with ❤ by GitHub



C++ <3