New Issue: Can we provide a way to call a CUDA/HIP kernel directly from Chapel?

25302, "e-kayrakli", "Can we provide a way to call a CUDA/HIP kernel directly from Chapel?", "2024-06-17T16:47:58Z"

This is a quick thought that occured to me while responding Chapel array and record representation on CUDA device.

These kernels are typically invoked with a special syntax like foo<<<block_size>>>(args). There are lower-level ways of launching kernels, but they are very uncommon (we use those interfaces in our runtime).

So, for a user to invoke a CUDA/HIP kernel from Chapel, they would first need to wrap it in a C function, and invoke that function. Can we remove that need (relatively easily??)? I am imagining a standalone function in the GPU module that takes all of those in <<<>>> as an additional argument:

extern proc foo(...); // do we need to decorate this somehow?
GPU.launchInteropKernel(foo, blockSize=128, ...)

Could we "fake" that syntax with some tricks:

GPU.launcher(blockSize=128)(foo, ...)

launcher here could return a record with a variadic this method or something, for example.

The biggest question in my mind is whether we can pass an extern proc foo like a first-class proc here? An alternative could be pass a name, and than find the function using dlopen under the hood.