New Issue: Can we provide a way to call a CUDA/HIP kernel directly from Chapel?

e-kayrakli1 · June 17, 2024, 4:49pm

25302, "e-kayrakli", "Can we provide a way to call a CUDA/HIP kernel directly from Chapel?", "2024-06-17T16:47:58Z"

Can we provide a way to call a CUDA/HIP kernel directly from Chapel?

opened 04:47PM - 17 Jun 24 UTC

type: Feature Request area: GPU Support

This is a quick thought that occured to me while responding https://chapel.disco…urse.group/t/chapel-array-and-record-representation-on-cuda-device/34278. These kernels are typically invoked with a special syntax like `foo<<<block_size>>>(args)`. There are lower-level ways of launching kernels, but they are very uncommon (we use those interfaces in our runtime). So, for a user to invoke a CUDA/HIP kernel from Chapel, they would first need to wrap it in a C function, and invoke that function. Can we remove that need (relatively easily??)? I am imagining a standalone function in the GPU module that takes all of those in `<<<>>>` as an additional argument: ```chpl extern proc foo(...); // do we need to decorate this somehow? GPU.launchInteropKernel(foo, blockSize=128, ...) ``` Could we "fake" that syntax with some tricks: ```chpl GPU.launcher(blockSize=128)(foo, ...) ``` `launcher` here could return a record with a variadic `this` method or something, for example. The biggest question in my mind is whether we can pass an `extern proc foo` like a first-class proc here? An alternative could be pass a name, and than find the function using `dlopen` under the hood.

This is a quick thought that occured to me while responding Chapel array and record representation on CUDA device.

These kernels are typically invoked with a special syntax like foo<<<block_size>>>(args). There are lower-level ways of launching kernels, but they are very uncommon (we use those interfaces in our runtime).

So, for a user to invoke a CUDA/HIP kernel from Chapel, they would first need to wrap it in a C function, and invoke that function. Can we remove that need (relatively easily??)? I am imagining a standalone function in the GPU module that takes all of those in <<<>>> as an additional argument:

extern proc foo(...); // do we need to decorate this somehow?
GPU.launchInteropKernel(foo, blockSize=128, ...)

Could we "fake" that syntax with some tricks:

GPU.launcher(blockSize=128)(foo, ...)

launcher here could return a record with a variadic this method or something, for example.

The biggest question in my mind is whether we can pass an extern proc foo like a first-class proc here? An alternative could be pass a name, and than find the function using dlopen under the hood.

Topic		Replies	Views
Chapel on an open-science Nvidia GPU + ARM system Users	6	324	September 20, 2021
Chapel array and record representation on CUDA device Users	3	85	June 17, 2024
Chapel deployment Users	19	490	May 11, 2022
Announcing Chapel 1.25.0! Announcements	0	403	September 24, 2021
Announcing Chapel 1.27.0! Announcements	0	308	June 30, 2022

New Issue: Can we provide a way to call a CUDA/HIP kernel directly from Chapel?

Related Topics