[Chapel Merge] Use new CUDA=>12.9 `cub::DeviceReduce::Arg{Min,Max}` interface

Branch: refs/heads/main
Revision: e8f39695fe74efea12d1417bdd80183e6ed075aa
Author: arifthpe
Link: Use new CUDA=>12.9 `cub::DeviceReduce::Arg{Min,Max}` interface by arifthpe · Pull Request #28874 · chapel-lang/chapel · GitHub
Log Message:
Use new CUDA=>12.9 cub::DeviceReduce::Arg{Min,Max} interface (#28874)

Conditionally define DEF_ONE_REDUCE_RET_VAL_IDX to use the new CUDA
12.9 interface for the cub ArgMin and ArgMax functions, fixing
deprecation warnings.

The upstream change is https://github.com/NVIDIA/cccl/pull/3148.

This is propagated to hipcub in
https://github.com/ROCm/rocm-libraries/pull/1246 and will be present in
ROCm 7.1, however this PR doesn't make that update; it actually makes
and reverts it, due to not enough capacity to test this in advance. It
does include adding CHPL_ROCM_VERSION_MINOR and CHPL_ROCM_VERSION
macros to be used when we need to update.

[reviewed by @jabraham17 , thanks!]

Testing:

  • works with CUDA 12.9
  • still works with CUDA 12.8

Compare: Comparing 6ea4d7ab61d83f635cfa2a52fb5c70e72c86b7f5...ad9e94f3fdcc24594acdc138649e94c462b4292f · chapel-lang/chapel · GitHub

Diff:
M runtime/src/gpu/nvidia/gpu-nvidia-cub.cc
M util/chplenv/chpl_gpu.py
https://github.com/chapel-lang/chapel/pull/28874.diff