Branch: refs/heads/main
Revision: 90560c8
Author: e-kayrakli
Log Message:
Merge pull request #18321 from e-kayrakli/gpu-stream
Add STREAM for GPU and make necessary adjustments for it
This PR adds a full Stream benchmark for GPU.
In order to achieve that it also makes the followign adjustments to the GPU
support:
- Adds proper grid size calculation based on loop "size" and block size.
- This is currenlty done by passing an argument representing number of threads
to the runtime's kernel launcher. From there, the runtime launcher does the
computation. - So, this PR adjusts the runtime interface slightly, as well.
- This is currenlty done by passing an argument representing number of threads
- Adds an early return check in the gpu kernel, in case the local thread index
is out-of-bounds for the loop. - Adds a
--gpu-block-size
compiler flag to control the block size of gpu
kernels. - Adjusts the denormalize pass to avoid replacing temps used for kernel launches
with equivalent expressions. This is done to avoid making more significant
adjustments in the kernel launch codegen.
While there:
- Moves the
Kernel launcher called
output earlier to catch fatal launch errors
that are due to not being able to load a kernel from the fatbinary. - Drop an unused variable from a test.
[Reviewed by @daviditen]
Test
-
[x] test/gpu/native
-
[x] standard
Modified Files:
A test/gpu/native/streamPrototype/stream.chpl
A test/gpu/native/streamPrototype/stream.compopts
A test/gpu/native/streamPrototype/stream.execopts
A test/gpu/native/streamPrototype/stream.good
M compiler/include/driver.h
M compiler/main/driver.cpp
M compiler/optimizations/deadCodeElimination.cpp
M compiler/passes/denormalize.cpp
M runtime/include/chpl-gpu.h
M runtime/src/chpl-gpu.c
M test/gpu/native/streamPrototype/forallOverZipArray.chpl
M util/chpl-completion.bashCompare: Comparing 78db7d764a59...90560c82232a · chapel-lang/chapel · GitHub