Chapel array and record representation on CUDA device

Iainmon1 · June 16, 2024, 1:43am

How are Chapel arrays and record instances represented when they are allocated on the CUDA device? Are they flattened arrays or do they have the same representation as data in normal Chapel programs? If I have written a CUDA kernel in C, how would I call it from my Chapel program?

e-kayrakli · June 17, 2024, 4:51pm

Hi Iain,

Records can be tricky. Our interop documentation states that non-extern Chapel records can't be passed to extern procs. So, the only way to make them work could be having a C struct, which then you declare in Chapel as an extern record. The implication is that you'd have to allocate memory on the C side for such records, presumably with things like cudaMalloc. I can try to put together an example for this, if you're interested.

You can pass pointers to Chapel arrays to C functions, which may call CUDA kernels down the road. Note that CUDA kernel launches have an esoteric syntax that's not supported by Chapel. So, there needs to be a C wrapper in between that's called from Chapel and that calls CUDA. For such cases, you can pass Chapel arrays allocated on the GPU memory like:

on here.gpus[0] {
  var Arr: [1..10] int; // this is allocated on the device memory
  externProc(c_ptrTo(Arr));  // this passes the address of the array's data buffer
}

I created Can we provide a way to call a CUDA/HIP kernel directly from Chapel? · Issue #25302 · chapel-lang/chapel · GitHub to ask for a more direct way for launching a CUDA kernel from Chapel.

Hope this helps,
Engin

Iainmon1 · June 17, 2024, 9:21pm

Hi Engin,

Thank you very much for your response. That's too bad about records, but this is definitely helpful. I would love to see the example you mentioned if that's not too much of a hassle.

As for cuBLAS and friends, would making calls to these necessitate writing extern procs in C/CUDA that call these libraries, then use them in Chapel?

What representation do arrays of higher dimension have on the C side?

When on here.gpus[0], does c_ptrTo(Arr) do any copying of Arr? In other words, is there any overhead for manipulating Chapel arrays via C programs if they are on a GPU? Like does the array need to be copied back to program memory, then be passed into the C function, then copied to the CUDA device? or does everything remain on device?

Thanks,
Iain

e-kayrakli · June 17, 2024, 10:56pm

Hi Iain,

I think performance may be a bit tricky, so I created Low CUDA API call performance when called from Chapel through interoperability · Issue #25311 · chapel-lang/chapel · GitHub. The example in that issue should answer your question here. Let me know if that's not the case.

As for cuBLAS and friends, would making calls to these necessitate writing extern procs in C/CUDA that call these libraries, then use them in Chapel?

Probably not. AFAIK, cuBLAS is a host library entirely. So, you should be able to invoke cuBLAS from Chapel directly. Note that we have a draft library that does exactly that. But that's only tested with CHPL_LOCALE_MODEL=flat. It has been a long-standing wish to ramp that library up in a way that it can be used in production with the GPU support enabled (CHPL_LOCALE_MODEL=gpu). See the library here: chapel/test/gpu/interop/cuBLAS at main · chapel-lang/chapel · GitHub

(On a quicker look on that library it looks like we have a C wrapper for it. I bet it has to do with C vs C++ linkage. But I can't remember really)

What representation do arrays of higher dimension have on the C side?

Local, rectangular Chapel arrays should have contiguous memory allocation, such that their buffer can be used in C. By default, such arrays have row-major ordering.

When on here.gpus[0] , does c_ptrTo(Arr) do any copying of Arr ?

No. A pointer is a pointer. If the array was allocated inside on here.gpus[0], the pointer you get will be pointing to the GPU memory. You can put that pointer in a void* in C, or CUDevicePtr. As long as, you keep in mind that it is pointing to the GPU memory and handle that pointer accordingly you should be fine.

Engin

Topic		Replies	Views
Chapel on an open-science Nvidia GPU + ARM system Users	6	324	September 20, 2021
Memory management on GPUs Users	5	67	June 17, 2024
(2D) Array Storage schemes Users	12	207	December 23, 2022
Chapel's programming model and features Users	3	236	July 6, 2023
Announcing Chapel 1.27.0! Announcements	0	308	June 30, 2022

Chapel array and record representation on CUDA device

Related Topics