I want to share with you something that I initially thought it wouldn't work... but it does. No reason behind it, just to prove (once again) that C++11 is indeed fantastic and it can handle (almost) whatever you throw at it.
Since 1 year the OpenCL 2.0 standard was ratified. The thing which is most exciting for me is device-side enqueue. This is a functionality which allows a kernel to submit new work directly on the device without the need for host intervention.
However there is something fishy with the way the function is defined and I am going to explain why. The new enqueue_kernel function (defined in the OpenCL 2.0 language specification) has several overloads:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
int enqueue_kernel ( | |
queue_t queue, | |
kernel_enqueue_flags_t flags, | |
const ndrange_t ndrange, | |
void (^block)(void)) | |
int enqueue_kernel ( | |
queue_t queue, | |
kernel_enqueue_flags_t flags, | |
const ndrange_t ndrange, | |
uint num_events_in_wait_list, | |
const clk_event_t *event_wait_list, | |
clk_event_t *event_ret, | |
void (^block)(void)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
int enqueue_kernel ( | |
queue_t queue, | |
kernel_enqueue_flags_t flags, | |
const ndrange_t ndrange, | |
void (^block)(local void *, ...), | |
uint size0, ...) | |
int enqueue_kernel ( | |
queue_t queue, | |
kernel_enqueue_flags_t flags, | |
const ndrange_t ndrange, | |
uint num_events_in_wait_list, | |
const clk_event_t *event_wait_list, | |
clk_event_t *event_ret, | |
void (^block)(local void *, ...), | |
uint size0, ...) |
The issue here are the two sets of "...". How many arguments should we accept? It is funny since the specs are not saying much about these additional arguments. However the only way to read this in a way that makes sense (in my understanding) is the following:
"If a closure function (or block) is passed which accepts N OpenCL "local" pointers, then their size is defined by an equal number of unsigned values (i.e., size0,..., sizeN-1). It is responsibility of the runtime support to allocate memory before the nested kernel is executed."
All of this to say that the length of the two variadic argument lists (the lambda's and the one internal to enqueue_kernel) must match. This means that it is responsibility of the compiler to perform this additional check.
I can see many people being happy with this... but couldn't we use the type system to enforce that? Can our beloved meta-programming fix this? Let's assume we were in C++? Would the API designer able to to express this concept (number of arguments in the closure equal number of arguments passed) just with the means of the type system? You will be glad to ear that with C++11, YES WE CAN! ...and I am going to show you how to do that.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Given any type, return unsigned int instead | |
template <typename T> | |
struct to_uint_traits { typedef unsigned int type; }; | |
template <typename... Args> | |
int enqueue_kernel( std::function<void (Args... )> block, | |
typename to_uint_traits<Args>::type... sizes ) | |
{ | |
// print sizes | |
for (auto size : {sizes...}) | |
{ | |
std::cout << size << std::endl; | |
} | |
std::cout << "Calling closure" << std::endl; | |
block(sizes...); | |
} | |
int main(int argc, char* argv[]) | |
{ | |
int ret; | |
enqueue_kernel( [&ret](int a, int b, int c) -> void { ret = a + b + c; }, | |
10, 20, 30 ); | |
std::cout << "Computed value: " << ret << std::endl; | |
} |
> 10
> 20
> 30
> Calling closure
> Computed value: 60
This highlight the power of the ... expansion operator of C++11 variadic templates. For example if we try to call this function using an invalid number of sizes a compiler error will be generated:
@ThinkPad-X1-Carbon:~$ g++ -std=c++11 test.cpp
test.cpp: In function ‘int main(int, char**)’:
test.cpp:25:10: error: too few arguments to function ‘int enqueue_kernel(std::function<void(Args ...)>, typename to_int<Args>::type ...) [with Args = {int, int, int}]’
10, 20);
^
test.cpp:10:5: note: declared here
int enqueue_kernel(std::function<void (Args... )> block, typename to_int<Args>::type... sizes)
^
And there you have it. A type-safe definition of OpenCL's enqueue_kernel using C++11. Just because in C++11 we can! Hate on that C lovers! :)
C++ <3
C++ <3
No comments:
Post a Comment