Link: Fix several issues with the GPU support by e-kayrakli · Pull Request #18762 · chapel-lang/chapel · GitHub
Merge pull request #18762 from e-kayrakli/gpu-dist-array-clean
Fix several issues with the GPU support
This PR has fixes for multiple issues with the GPU support. All of these are
motivated by the desire to be able to use distributed arrays to allocate arrays
on GPUs. Sadly, this PR does not fully enable that. It only fixes some smaller
issues that I bumped into on the way.
The problems that this PR fixes:
- Removes the limitation on the outliner that caused it to look at only user
chpl-gpu-gen-includes.hheader to support runtime calls added at
codegen time in GPU kernels.
- Adjust chplenv scripts to be able to put that file into
- Adds a new environment variable
CHPL_GPU_CODEGEN. Currently this could
none. If you set
CHPL_LOCALE_MODEL=gpu, then we
cudaautomatically without any check.
PRIM_GET_MEMBERin loop bodies properly.
- Set a GPU sublocale's maximum task parallelism to be 1. Previously, it was
defaulting to 0.
- Fix a bug with 0-sized allocations/frees in the GPU allocators. CUDA Driver
API functions we use require non-zero size, and they'll return an error
otherwise. We now check for the non-zero size before calling those functions.
- Adds new tests.
- Adds a new prediff filter to suppress a ptxas warning. This is almost certainly
not the right thing to do. See https://github.com/Cray/chapel-private/issues/2758
A nice side-effect of this PR is that now we can run simple promotions on GPU.
I was able to run
A = 1 on GPU, but not
A = B * alpha + C. (status of
promotion on GPU is in https://github.com/Cray/chapel-private/issues/2695)
[Reviewed by @daviditen]