After seeing serialization, one fundamental thing is missing, i.e. deserialization!
There is not much to say here, this is the inverse operation we performed during serialization... therefore similar patters apply.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
namespace detail { | |
template <class T> | |
struct deserialize_helper; | |
template <class T> | |
struct deserialize_helper { | |
/** | |
* Deserialization for integral types and POD data types. | |
* | |
* This is done by simply relying on the sizeof() operator of the object. | |
* It is important that the datatype we are deserializing has a default | |
* contructor, otherwise you need to provide a specialization for that type | |
*/ | |
static T apply(StreamType::const_iterator& begin, StreamType::const_iterator end) { | |
assert(begin+sizeof(T)<=end && "Error: not enough bytes to deserialize type"); | |
T val; | |
uint8_t* ptr = reinterpret_cast<uint8_t*>(&val); | |
std::copy(begin, begin+sizeof(T), ptr); | |
begin+=sizeof(T); | |
return val; | |
} | |
}; | |
template <class T> | |
struct deserialize_helper<std::vector<T>> { | |
/** | |
* Deserialization for vector types. | |
* | |
* We read the first size_t value which contains the number of elements | |
* and then recursively read each of them. An optimization can be done | |
* for which we avoid recursion in the case we have a vector of PODs or | |
* integral type (soon to come) | |
*/ | |
static std::vector<T> apply(StreamType::const_iterator& begin, | |
StreamType::const_iterator end) | |
{ | |
// retrieve the number of elements | |
size_t size = deserialize_helper<size_t>::apply(begin,end); | |
std::vector<T> vect(size); | |
for(size_t i=0; i<size; ++i) { | |
/** | |
* Call the move-copy constructor so that the additional copy | |
* is avoided | |
*/ | |
vect[i] = std::move(deserialize_helper<T>::apply(begin,end)); | |
} | |
return vect; | |
} | |
}; | |
template <> | |
struct deserialize_helper<std::string> { | |
/** | |
* Deserialization for strings. | |
* | |
* similar to vectors but we can avoid recursion since the elements | |
* of a string are always bytes. | |
*/ | |
static std::string apply(StreamType::const_iterator& begin, | |
StreamType::const_iterator end) | |
{ | |
// retrieve the number of elements | |
size_t size = deserialize_helper<size_t>::apply(begin,end); | |
// We need to consider the case of empty strings separately | |
if (size == 0u) return std::string(); | |
std::string str(size,'\0'); | |
for(size_t i=0; i<size; ++i) { | |
str.at(i) = deserialize_helper<uint8_t>::apply(begin,end); | |
} | |
return str; | |
} | |
}; | |
template <class tuple_type> | |
inline void deserialize_tuple(tuple_type& obj, StreamType::const_iterator& begin, | |
StreamType::const_iterator end, int_<0>) { | |
constexpr size_t idx = std::tuple_size<tuple_type>::value-1; | |
typedef typename std::tuple_element<idx,tuple_type>::type T; | |
// Use std::move() to force R-value copy constructor and avoid the copy | |
std::get<idx>(obj) = std::move(deserialize_helper<T>::apply(begin, end)); | |
} | |
template <class tuple_type, size_t pos> | |
inline void deserialize_tuple(tuple_type& obj, StreamType::const_iterator& begin, | |
StreamType::const_iterator end, int_<pos>) { | |
constexpr size_t idx = std::tuple_size<tuple_type>::value-pos-1; | |
typedef typename std::tuple_element<idx,tuple_type>::type T; | |
// Use std::move() to force R-value copy constructor and avoid the copy | |
std::get<idx>(obj) = std::move(deserialize_helper<T>::apply(begin, end)); | |
// meta-recur | |
deserialize_tuple(obj, begin, end, int_<pos-1>()); | |
} | |
template <class... T> | |
struct deserialize_helper<std::tuple<T...>> { | |
/** | |
* Deserialization for tuples | |
* | |
* same pattern as for serialization | |
*/ | |
static std::tuple<T...> apply(StreamType::const_iterator& begin,StreamType::const_iterator end) | |
{ | |
// if only this worked... sigh! | |
// return std::make_tuple(deserialize_helper<T>::apply(begin,end)...); | |
std::tuple<T...> ret; | |
deserialize_tuple(ret, begin, end, int_<sizeof...(T)-1>()); | |
return ret; | |
} | |
}; | |
} // end detail namespace | |
template <class T> | |
inline T deserialize(StreamType::const_iterator& begin, const StreamType::const_iterator& end) { | |
return detail::deserialize_helper<T>::apply(begin, end); | |
} | |
template <class T> | |
inline T deserialize(const StreamType& res) { | |
StreamType::const_iterator it = res.cbegin(); | |
return deserialize(it, res.cend()); | |
} |
And we are done!
In the std::tuple<T...> deserializer I would have liked to use the commented line. With that line I could remove 20+ lines of code which are used by the deserialize_tuple method. However, in that way the object is deserialized in the inverse order. The type is correct, but since it seems that the arguments of the make_tuple function are evaluated right-to-left, the resulting elements of the tuple are inverted. Therefore a serialized tuple (1,2,3) is deserialized back as (3,2,1) :(. This is caused by the fact that the apply function has side-effects and in C++ we cannot rely on the evaluation order of the arguments of a function, therefore this code is not safe and better take the safe solution (however it might work in some C++ compiler).
Just to see if everything is working fine we write our usual test cases using google-test:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TEST(Deserialize, Ints) { | |
int v1 = deserialize<uint32_t>({0xA, 0, 0, 0}); | |
EXPECT_EQ(v1, 10); | |
auto v2 = deserialize<uint64_t>({0x40, 0, 0, 0, 0, 0, 0, 0}); | |
EXPECT_EQ(v2, 64u); | |
auto v3 = deserialize<int>({0xFF, 0xFF, 0xFF, 0xFF}); | |
EXPECT_EQ(v3, -1); | |
} | |
TEST(Deserialize, Vector) { | |
auto v1 = deserialize<std::vector<int>>({2,0,0,0,0,0,0,0,1,0,0,0,2,0,0,0}); | |
EXPECT_EQ(v1, std::vector<int>({1,2})); | |
auto v2 = deserialize<std::vector<std::vector<uint8_t>>>( | |
{2, 0, 0, 0, 0, 0, 0, 0, /* size */ | |
2, 0, 0, 0, 0, 0, 0, 0, /* size */ 1, 2, | |
2, 0, 0, 0, 0, 0, 0, 0, /* size */ 3, 4 | |
}); | |
EXPECT_EQ(v2, std::vector<std::vector<uint8_t>>({{1,2},{3,4}})); | |
} | |
TEST(Deserialize, IntTuple) { | |
auto t1 = deserialize<std::tuple<int,int>>({1, 0, 0, 0, 2, 0, 0, 0}); | |
EXPECT_EQ(t1, std::make_tuple(1,2)); | |
auto t2 = deserialize<std::tuple<int,int,char>>({1, 0, 0, 0, 2, 0, 0, 0, 3}); | |
EXPECT_EQ(t2, std::make_tuple(1,2,3)); | |
} | |
TEST(Deserialize, TupleVec) { | |
auto t = deserialize<std::tuple<int,int,std::vector<uint8_t>>>( | |
{ | |
10, 0, 0, 0, /* get<0> */ | |
20, 0, 0, 0, /* get<1> */ | |
2, 0, 0, 0, 0, /* size */ 0, 0, 0, 1, 2 /* get<2> */ | |
}); | |
EXPECT_EQ(t, std::make_tuple(10,20,std::vector<uint8_t>({1,2}))); | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TEST(Performance, Water) { | |
auto t1 = std::tuple<int,uint64_t,std::vector<uint8_t>,std::string>( | |
10, 20, std::vector<uint8_t>{0,1,2,3,4,5,6,7,8,9},"hello cpp-love!"); | |
auto start = std::chrono::high_resolution_clock::now(); | |
for (size_t i=0; i<500000; ++i) { | |
StreamType res; | |
serialize(t1,res); | |
auto t2 = deserialize<decltype(t1)>(res); | |
if (t1!=t2) { std::cerr << "BIG PROBLEMS!!!.. call MUM!" << std::endl; } | |
} | |
auto tot_time = std::chrono::duration_cast<std::chrono::microseconds>( | |
std::chrono::system_clock::now()-start).count(); | |
std::cout << "time: " << tot_time << std::endl; | |
} |
The code is similar to the one we saw in the previous post this time I add a call to deserialize and a stupid if to be sure the compiler is not doing any dead-code elimination. The code for boost::serialization is similar, just trust me (I know I am Italian... it might be difficult... but come on, as my supervisor says... "give me a break").
The result is well... quite impressive. I didn't do this exercise with performance in mind, rather than my goal was to eliminate a dependency on the boost libraries. Now I realize that boost is definitely doing something really wrong in the serialization library. The added storing of typing info does not justify the huge performance penalty. My solution is 20x faster! Since the messages I produce are half of the size (thanks to the missing typing info) I would expect boost to be twice as slow.
I am frankly quite pleased by the performance improvements I saw within the libWater project after replacing boost::serialization with this solution. We had a 10% performance improvement which in HPC is quite welcome.
The full code, plus the test cases are available on github (under the BSD license): https://github.com/motonacciu/meta-serialization
(contributions are welcome)
Read: PART I: get_size(...)
The result is well... quite impressive. I didn't do this exercise with performance in mind, rather than my goal was to eliminate a dependency on the boost libraries. Now I realize that boost is definitely doing something really wrong in the serialization library. The added storing of typing info does not justify the huge performance penalty. My solution is 20x faster! Since the messages I produce are half of the size (thanks to the missing typing info) I would expect boost to be twice as slow.
I am frankly quite pleased by the performance improvements I saw within the libWater project after replacing boost::serialization with this solution. We had a 10% performance improvement which in HPC is quite welcome.
The full code, plus the test cases are available on github (under the BSD license): https://github.com/motonacciu/meta-serialization
(contributions are welcome)
Read: PART I: get_size(...)
Read: PART II: serialize(...)
C++ <3