[Chapel Merge] Add `GPUDiagnostics` module and track GPU memory a

Branch: refs/heads/main
Revision: 4ab5f08
Author: e-kayrakli
Link: Unavailable
Log Message:

Merge pull request #19804 from e-kayrakli/gpu-diag

Add GPUDiagnostics module and track GPU memory allocations

Adds GPUDiagnostics module. Makes memtracking record/report GPU ids.

The GPUDiagnostics module is heavily inspired by the CommDiagnostics module,
both in interface and in implementation. The current interface looks like:

  • startGPUDiagnostics()
  • stopGPUDiagnostics()
  • getGPUDiagnostics()
  • startVerboseGPU()
  • stopVerboseGPU()

The support for *Here functions are a future work.

The only event that this module records are kernel launches.

This PR also makes a major cleanup in GPU tests. While doing that, I also
realized that kernels can't call functions that access module-scope variables.
So, a related test is futurized.

[Reviewed by @ronawho, @stonea and @daviditen]

Test:

  • [x] gpu/native

    Modified Files:
    A modules/standard/GPUDiagnostics.chpl
    A runtime/include/chpl-gpu-diags.h
    A runtime/src/chpl-gpu-diags.c
    A test/gpu/native/diags.chpl
    A test/gpu/native/diags.good
    A test/gpu/native/diags.prediff
    A test/gpu/native/distArray/blockOutsideOnWorkaround.good
    A test/gpu/native/kernelFnCalls/callFnAccessModVar.bad
    A test/gpu/native/kernelFnCalls/callFnAccessModVar.future
    A test/gpu/native/kernelFnCalls/fnWithForall.prediff
    A test/gpu/native/multiGPU/multiGPU.good
    A test/gpu/native/multiGPU/worksharing.good
    A test/gpu/native/multiGPU/worksharingBasic.good
    R test/gpu/native/dataPingPong.execopts
    R test/gpu/native/dataPingPong.prediff
    R test/gpu/native/distArray/blockInsideOn-verbose.chpl
    R test/gpu/native/distArray/blockInsideOn-verbose.execopts
    R test/gpu/native/distArray/blockInsideOn-verbose.good
    R test/gpu/native/distArray/blockInsideOn-verbose.prediff
    R test/gpu/native/distArray/blockOutsideOnWorkaround-verbose.chpl
    R test/gpu/native/distArray/blockOutsideOnWorkaround-verbose.execopts
    R test/gpu/native/distArray/blockOutsideOnWorkaround-verbose.good
    R test/gpu/native/distArray/blockOutsideOnWorkaround-verbose.prediff
    R test/gpu/native/distArray/blockOutsideOnWorkaround.good
    R test/gpu/native/innerBlock.execopts
    R test/gpu/native/innerBlock.prediff
    R test/gpu/native/jacobi/jacobi-verbose.chpl
    R test/gpu/native/jacobi/jacobi-verbose.execopts
    R test/gpu/native/jacobi/jacobi-verbose.good
    R test/gpu/native/jacobi/jacobi-verbose.prediff
    R test/gpu/native/multiGPU/EXECOPTS
    R test/gpu/native/multiGPU/PREDIFF
    R test/gpu/native/multiGPU/README
    R test/gpu/native/multiGPU/multiGPU.numlaunches
    R test/gpu/native/multiGPU/worksharing.numlaunches
    R test/gpu/native/multiGPU/worksharingBasic.numlaunches
    M compiler/codegen/cg-expr.cpp
    M doc/rst/meta/modules/standard.rst
    M modules/Makefile
    M runtime/include/chpl-comm-internal.h
    M runtime/include/chpl-gpu.h
    M runtime/include/chpl-mem-desc.h
    M runtime/include/chplmemtrack.h
    M runtime/include/stdchpl.h
    M runtime/src/Makefile.share
    M runtime/src/chpl-comm.c
    M runtime/src/chpl-gpu.c
    M runtime/src/chplmemtrack.c
    M test/gpu/native/dataPingPong.chpl
    M test/gpu/native/dataPingPong.good
    M test/gpu/native/distArray/blockInsideOn.chpl
    M test/gpu/native/distArray/blockInsideOn.good
    M test/gpu/native/distArray/blockOutsideOnWorkaround.chpl
    M test/gpu/native/innerBlock.chpl
    M test/gpu/native/innerBlock.good
    M test/gpu/native/jacobi/flags-no-checks.good
    M test/gpu/native/jacobi/flags-warn-unstable.good
    M test/gpu/native/jacobi/jacobi.chpl
    M test/gpu/native/jacobi/jacobi.good
    M test/gpu/native/kernelFnCalls/callFnAccessModVar.chpl
    M test/gpu/native/kernelFnCalls/callFnAccessModVar.good
    M test/gpu/native/kernelFnCalls/callFnFromFn.chpl
    M test/gpu/native/kernelFnCalls/callFnFromFn.good
    M test/gpu/native/kernelFnCalls/callTrivialFn.chpl
    M test/gpu/native/kernelFnCalls/callTrivialFn.good
    M test/gpu/native/kernelFnCalls/fnWithForall.chpl
    M test/gpu/native/kernelFnCalls/fnWithForall.good
    M test/gpu/native/multiGPU/multiGPU.chpl
    M test/gpu/native/multiGPU/worksharing.chpl
    M test/gpu/native/multiGPU/worksharingBasic.chpl
    M test/gpu/native/streamPrototype/dr.chpl
    M test/gpu/native/streamPrototype/dr.good
    M test/gpu/native/streamPrototype/forallOverArray.chpl
    M test/gpu/native/streamPrototype/forallOverArray.good
    M test/gpu/native/streamPrototype/forallOverDomain.chpl
    M test/gpu/native/streamPrototype/forallOverDomain.good
    M test/gpu/native/streamPrototype/forallOverZipArray.chpl
    M test/gpu/native/streamPrototype/forallOverZipArray.good
    M test/gpu/native/streamPrototype/stream.chpl
    M test/gpu/native/streamPrototype/stream.good

    Compare: Comparing 34f29449ce5a...4ab5f08137b7 · chapel-lang/chapel · GitHub