[Chapel Merge] Selectively use 64-bit ints for GPU kernel index computation

chapelhpcBot · May 8, 2023, 8:11pm

Branch: refs/heads/main
Revision: 163824dd587cf31f375e78716e4c721d88719d5b
Author: e-kayrakli
Link: Selectively use 64-bit ints for GPU kernel index computation by e-kayrakli · Pull Request #22259 · chapel-lang/chapel · GitHub
Log Message:
Selectively use 64-bit ints for GPU kernel index computation (#22259)

This addresses a bug reported in
[GPU] forall over large range doesn't enqueue enough GPU threads?.

It looks like we have been using ints in the runtime for
num_threads, and similarly dtInt[INT_SIZE_32] while generating index
computation code within GPU kernels. Both of these limited us to run GPU
kernels on loops with bounds that can fit in 32-bit ints. This is an
arbitrary limitation as GPUs can run more than 2**32 threads. This PR
fixes that.

While there improves --debugGpu output in the following ways:

adds grid dimensions to the output. Ideally, we should move this
computation from the chpl-gpu-impl layer to chpl-gpu layer and print
the info out with startVerboseGpu output. But that's more than I am
willing to do in this bug fix PR
fixes an output that printed out a size_t with %d instead of %zu
comments out the output for chpl_gpu_memmove, which generates a ton
of output making debugGpu useless.

[Reviewed by @DanilaFe]

Test:

gpu/native with NVIDIA
gpu/native with AMD

Compare: Comparing b84bf8b31a1b4145d4231b99ce532201864026cb...ec809edaa545ab641fffd7788a04c724b93ced71 · chapel-lang/chapel · GitHub

Diff:
M compiler/optimizations/gpuTransforms.cpp
M runtime/include/chpl-gpu-impl.h
M runtime/include/chpl-gpu.h
M runtime/src/chpl-gpu.c
M runtime/src/gpu/cuda/gpu-cuda.c
M runtime/src/gpu/rocm/gpu-rocm.c
A test/gpu/native/largeLoop.chpl
A test/gpu/native/largeLoop.good
https://github.com/chapel-lang/chapel/pull/22259.diff

Topic		Replies	Views
[GPU] forall over large range doesn't enqueue enough GPU threads? Users	5	182	May 10, 2023
Announcing Chapel 1.28.0! Announcements	0	333	September 16, 2022
How to use the prototype GPU codegen feature in 1.24? Users	11	299	April 28, 2021
Chapel on an open-science Nvidia GPU + ARM system Users	6	297	September 20, 2021

[Chapel Merge] Selectively use 64-bit ints for GPU kernel index computation

Related Topics