25302, "e-kayrakli", "Can we provide a way to call a CUDA/HIP kernel directly from Chapel?", "2024-06-17T16:47:58Z"
This is a quick thought that occured to me while responding Chapel array and record representation on CUDA device.
These kernels are typically invoked with a special syntax like foo<<<block_size>>>(args)
. There are lower-level ways of launching kernels, but they are very uncommon (we use those interfaces in our runtime).
So, for a user to invoke a CUDA/HIP kernel from Chapel, they would first need to wrap it in a C function, and invoke that function. Can we remove that need (relatively easily??)? I am imagining a standalone function in the GPU module that takes all of those in <<<>>>
as an additional argument:
extern proc foo(...); // do we need to decorate this somehow?
GPU.launchInteropKernel(foo, blockSize=128, ...)
Could we "fake" that syntax with some tricks:
GPU.launcher(blockSize=128)(foo, ...)
launcher
here could return a record with a variadic this
method or something, for example.
The biggest question in my mind is whether we can pass an extern proc foo
like a first-class proc here? An alternative could be pass a name, and than find the function using dlopen
under the hood.