[Chapel Merge] Add gpu set blockSize primitive

Branch: refs/heads/main
Revision: 8986fc4
Author: stonea
Link: Add `gpu set blockSize` primitive by stonea · Pull Request #20248 · chapel-lang/chapel · GitHub
Log Message:

Merge pull request #20248 from stonea/blockSize_prim

Add gpu set blockSize primitive

To use this primitive put it in a foreach (or forall) loop that is going to be Gpuized. The symbol passed to it will be used to set the block size on kernel launch. For example, this loop will launch with a block size of 64:

on here.gpus[0] {
foreach i in 0..127 {
__primitive("gpu set blockSize", 64);
}
}

The primitive will only be applied to the GPUIzed loop it is embedded inside. The primitive is applied indiscriminately and is not sensitive to control flow. I don't do any sort of error or bounds checking or anything like that as long-term this intent is that gets inserted by the compiler.
To verify that this indeed launches a kernel with a specific blocksize run with --verbose.

[Reviewed by @e-kayrakli and @ShreyasKhandekar]

Modified Files:
A test/gpu/native/setBlockSizePrimitive.chpl

A test/gpu/native/setBlockSizePrimitive.good
M compiler/AST/primitive.cpp
M compiler/dyno/include/chpl/uast/prim-ops-list.h
M compiler/dyno/lib/resolution/prims.cpp
M compiler/optimizations/gpuTransforms.cpp
M compiler/optimizations/optimizeOnClauses.cpp
M runtime/include/chpl-gpu-diags.h
M runtime/src/chpl-gpu.c
M test/gpu/native/diags.good
M test/gpu/native/kernelFnCalls/fnWithForall.good

Compare: https://github.com/chapel-lang/chapel/compare/c404538e265f...8986fc45aa5f