[Chapel Merge] Fix sporadic gpu issue (race condition between mem

Branch: refs/heads/main
Revision: e712fe9
Author: stonea
Link: Fix sporadic gpu issue (race condition between memory allocation and CUDA context initialization) by stonea · Pull Request #19326 · chapel-lang/chapel · GitHub
Log Message:

Merge pull request #19326 from stonea/fix_gpu_sporadic_error

Fix sporadic gpu issue (race condition between memory allocation and CUDA context initialization)

Do not require CUDA context to be initialized prior to calling functions that call out to chpl_gpu_mem_alloc. chpl_gpu_mem_alloc() already checks to see if the context has been initialized and will initialize it if needed. This caused a sporadic issue (race condition) where sometimes the memory allocation function would be called before something else caused the context to be initialized. In such cases, the memory allocation would be done on the CPU (and we'd segfault after a GPU kernel tries to access that memory)

I go into more detail about this in the comment here: Cray/chapel-private#3100 (comment)

[Reviewed by @mppf]

Modified Files:
M modules/internal/localeModels/gpu/LocaleModel.chpl

M runtime/src/chpl-gpu.c
M test/gpu/native/PREDIFF
M util/cron/test-cray-cs-gpu-native.bash

Compare: https://github.com/chapel-lang/chapel/compare/792a5caff15b...e712fe99b83b