[Chapel Merge] Fix several issues with the GPU support

Branch: refs/heads/main
Revision: e19ef35
Author: e-kayrakli
Link: Fix several issues with the GPU support by e-kayrakli · Pull Request #18762 · chapel-lang/chapel · GitHub
Log Message:

Merge pull request #18762 from e-kayrakli/gpu-dist-array-clean

Fix several issues with the GPU support

This PR has fixes for multiple issues with the GPU support. All of these are
motivated by the desire to be able to use distributed arrays to allocate arrays
on GPUs. Sadly, this PR does not fully enable that. It only fixes some smaller
issues that I bumped into on the way.

The problems that this PR fixes:

  • Removes the limitation on the outliner that caused it to look at only user
    modules.
  • Adds chpl-gpu-gen-includes.h header to support runtime calls added at
    codegen time in GPU kernels.
  • Adjust chplenv scripts to be able to put that file into
    runtime/include/gpu/cuda
  • Adds a new environment variable CHPL_GPU_CODEGEN. Currently this could
    be only cuda or none. If you set CHPL_LOCALE_MODEL=gpu, then we
    pick cuda automatically without any check.
  • Handle PRIM_GET_MEMBER in loop bodies properly.
  • Set a GPU sublocale's maximum task parallelism to be 1. Previously, it was
    defaulting to 0.
  • Fix a bug with 0-sized allocations/frees in the GPU allocators. CUDA Driver
    API functions we use require non-zero size, and they'll return an error
    otherwise. We now check for the non-zero size before calling those functions.
  • Adds new tests.
  • Adds a new prediff filter to suppress a ptxas warning. This is almost certainly
    not the right thing to do. See https://github.com/Cray/chapel-private/issues/2758

A nice side-effect of this PR is that now we can run simple promotions on GPU.
I was able to run A = 1 on GPU, but not A = B * alpha + C. (status of
promotion on GPU is in https://github.com/Cray/chapel-private/issues/2695)

[Reviewed by @daviditen]

Test:

  • [x] gpu/native

    Modified Files:
    A runtime/include/gpu/cuda/chpl-gpu-gen-includes.h
    A test/gpu/native/distArray/PREDIFF
    A test/gpu/native/distArray/blockInsideOn-verbose.chpl
    A test/gpu/native/distArray/blockInsideOn-verbose.execopts
    A test/gpu/native/distArray/blockInsideOn-verbose.good
    A test/gpu/native/distArray/blockInsideOn-verbose.prediff
    A test/gpu/native/distArray/blockInsideOn.chpl
    A test/gpu/native/distArray/blockInsideOn.good
    A test/gpu/native/distArray/blockOutsideOnWorkaround-verbose.chpl
    A test/gpu/native/distArray/blockOutsideOnWorkaround-verbose.execopts
    A test/gpu/native/distArray/blockOutsideOnWorkaround-verbose.good
    A test/gpu/native/distArray/blockOutsideOnWorkaround-verbose.prediff
    A test/gpu/native/distArray/blockOutsideOnWorkaround.chpl
    A test/gpu/native/distArray/blockOutsideOnWorkaround.good
    A test/gpu/native/jacobi/jacobi-verbose.prediff
    A test/gpu/native/launchCounter.prediff
    A test/gpu/native/promotion.chpl
    A test/gpu/native/promotion.execopts
    A test/gpu/native/promotion.good
    A test/gpu/native/promotion.prediff
    R test/gpu/native/jacobi/jacobi-verbose.prediff
    M compiler/optimizations/gpuTransforms.cpp
    M modules/internal/LocaleModelHelpSetup.chpl
    M runtime/include/stdchpl.h
    M runtime/make/Makefile.runtime.include
    M runtime/src/chpl-gpu.c
    M test/gpu/native/PREDIFF
    M util/chplenv/chpl_compiler.py
    M util/chplenv/chpl_gpu.py
    M util/chplenv/printchplenv.py

    Compare: Comparing fe08035e86ab...e19ef3575812 · chapel-lang/chapel · GitHub