Branch: refs/heads/main
Revision: 25d82fd
Author: stonea
Link: Add GPU memory strategies by stonea · Pull Request #20460 · chapel-lang/chapel · GitHub
Log Message:
Merge pull request #20460 from stonea/gpu_mem_strategies
Add GPU memory strategies
This PR is really a clone of Engin's draft PR here:
#20394
At a high level what this PR does is add an experimental new "memory strategy" so that from within a GPU sublocale:
Array data is allocated on the device and
All other data is allocated in page-locked host memory
Note that this new experimental mode won't be the default. Users will have to "opt in" by setting CHPL_GPU_MEM_STRATEGY=array_on_device.
Before he left Engin and I talked and we agreed I'd take his draft PR and drive it to completion. I've incorporated the feedback I posted on Engin's draft PR as review comments. The most visible difference (from a user's perspective) is I renamed the environment variable Engin was adding CHPL_CUDA_MEMTYPE to CHPL_GPU_MEM_STRATEGY. So to enable the feature you'd use CHPL_GPU_MEM_STRATEGY=array_on_device (which is more verbose than what Engin originally proposed but I think more accurate).
After this PR there are a few limitations that I want to try and resolve before the upcoming release:
If you access array memory from outside a GPUized loop it will crash.
You need to pass chpl -schpl_defaultGpuArrayInitMethod=ArrayInit.gpuInitso that. Ideally this should just get implicitly set if CHPL_GPU_MEM_STRATEGY=gpu_on_device`.
Nevertheless, I think it makes more sense to merge what we have and try and address these issues in later PRs.
[Reviewed by @daviditen and @stonea]
Modified Files:
A test/gpu/native/page-locked-mem/COMPOPTS
A test/gpu/native/page-locked-mem/NOTEST
A test/gpu/native/page-locked-mem/README
A test/gpu/native/page-locked-mem/jacobi.chpl
A test/gpu/native/page-locked-mem/jacobi.good
A test/gpu/native/page-locked-mem/stream.chpl
A test/gpu/native/page-locked-mem/stream.compopts
A test/gpu/native/page-locked-mem/stream.execopts
A test/gpu/native/page-locked-mem/stream.good
R test/gpu/native/promotion2.execopts
R test/gpu/native/promotion2.prediff
M modules/internal/ChapelBase.chpl
M runtime/include/chpl-gpu-impl.h
M runtime/include/chpl-gpu.h
M runtime/include/chpl-init.h
M runtime/include/chpl-mem-array.h
M runtime/include/chpl-mem.h
M runtime/src/chpl-gpu.c
M runtime/src/chpl-init.c
M runtime/src/gpu/common/cuda-shared.h
M runtime/src/gpu/cuda/gpu-cuda.c
M test/gpu/native/promotion2.chpl
M test/gpu/native/promotion2.good
M test/gpu/native/studies/kernelLaunch/emptyKernelLaunch.chpl
M util/chplenv/chpl_gpu.py
M util/chplenv/compile_link_args_utils.py
M util/chplenv/printchplenv.py
Compare: https://github.com/chapel-lang/chapel/compare/2c57b47c86f8...25d82fdeef91