Branch: refs/heads/main
Revision: e19ef35
Author: e-kayrakli
Link: Fix several issues with the GPU support by e-kayrakli · Pull Request #18762 · chapel-lang/chapel · GitHub
Log Message:
Merge pull request #18762 from e-kayrakli/gpu-dist-array-clean
Fix several issues with the GPU support
This PR has fixes for multiple issues with the GPU support. All of these are
motivated by the desire to be able to use distributed arrays to allocate arrays
on GPUs. Sadly, this PR does not fully enable that. It only fixes some smaller
issues that I bumped into on the way.
The problems that this PR fixes:
- Removes the limitation on the outliner that caused it to look at only user
modules. - Adds
chpl-gpu-gen-includes.h
header to support runtime calls added at
codegen time in GPU kernels. - Adjust chplenv scripts to be able to put that file into
runtime/include/gpu/cuda
- Adds a new environment variable
CHPL_GPU_CODEGEN
. Currently this could
be onlycuda
ornone
. If you setCHPL_LOCALE_MODEL=gpu
, then we
pickcuda
automatically without any check. - Handle
PRIM_GET_MEMBER
in loop bodies properly. - Set a GPU sublocale's maximum task parallelism to be 1. Previously, it was
defaulting to 0. - Fix a bug with 0-sized allocations/frees in the GPU allocators. CUDA Driver
API functions we use require non-zero size, and they'll return an error
otherwise. We now check for the non-zero size before calling those functions. - Adds new tests.
- Adds a new prediff filter to suppress a ptxas warning. This is almost certainly
not the right thing to do. See https://github.com/Cray/chapel-private/issues/2758
A nice side-effect of this PR is that now we can run simple promotions on GPU.
I was able to run A = 1
on GPU, but not A = B * alpha + C
. (status of
promotion on GPU is in https://github.com/Cray/chapel-private/issues/2695)
[Reviewed by @daviditen]
Test:
-
[x] gpu/native
Modified Files:
A runtime/include/gpu/cuda/chpl-gpu-gen-includes.h
A test/gpu/native/distArray/PREDIFF
A test/gpu/native/distArray/blockInsideOn-verbose.chpl
A test/gpu/native/distArray/blockInsideOn-verbose.execopts
A test/gpu/native/distArray/blockInsideOn-verbose.good
A test/gpu/native/distArray/blockInsideOn-verbose.prediff
A test/gpu/native/distArray/blockInsideOn.chpl
A test/gpu/native/distArray/blockInsideOn.good
A test/gpu/native/distArray/blockOutsideOnWorkaround-verbose.chpl
A test/gpu/native/distArray/blockOutsideOnWorkaround-verbose.execopts
A test/gpu/native/distArray/blockOutsideOnWorkaround-verbose.good
A test/gpu/native/distArray/blockOutsideOnWorkaround-verbose.prediff
A test/gpu/native/distArray/blockOutsideOnWorkaround.chpl
A test/gpu/native/distArray/blockOutsideOnWorkaround.good
A test/gpu/native/jacobi/jacobi-verbose.prediff
A test/gpu/native/launchCounter.prediff
A test/gpu/native/promotion.chpl
A test/gpu/native/promotion.execopts
A test/gpu/native/promotion.good
A test/gpu/native/promotion.prediff
R test/gpu/native/jacobi/jacobi-verbose.prediff
M compiler/optimizations/gpuTransforms.cpp
M modules/internal/LocaleModelHelpSetup.chpl
M runtime/include/stdchpl.h
M runtime/make/Makefile.runtime.include
M runtime/src/chpl-gpu.c
M test/gpu/native/PREDIFF
M util/chplenv/chpl_compiler.py
M util/chplenv/chpl_gpu.py
M util/chplenv/printchplenv.pyCompare: Comparing fe08035e86ab...e19ef3575812 · chapel-lang/chapel · GitHub