17619, "mstrout", "Planning for GPU support for NVIDIA and AMD GPUs", "2021-04-30T16:31:35Z"
The current prototype GPU demo in Chapel targets CUDA #17208, #17116. Nodes we want to target might also have AMD GPUs. Thus I asked an expert at targeting GPUs, Sunita Chandrasekaran (@sunitachandra), for her advice and here is what she had to say:
OK on the interesting topic of Chapel and HIP...here are some thoughts.. Thanks for dropping a note, made me think deeper and ping a few colleagues from my PIConGPU/alpaka team to get any insights that I might have missed. So here are a few thoughts:
As part of the ORNL CAAR project, we have been using HIP pretty heavily. The main plus point is AMDs ROCm (their HIP implementation) is an open source ecosystem so that helps us file bug reports and PRs and they come around fixing it, which is good.
In the past AMD has deprecated parts of their software stack every now and then, and that has made me wonder about the stability of the ecosystem. However, with Frontier soon to be up and running, I would imagine their software ecosystem to be more stable than in the past. We work closely with the AMD team and they have been very responsive and wonderful to work with so it has been a great experience so far.
If we compile a HIP program for NVIDIA, one could also use clang as a compiler, yes there are a few issues coz Clang doesn't support all of CUDA features such as CUDA graphs, asynchronous ballot etc., well the issue is going to exist if one uses alpaka and/or kokkos for example...
Another way to look at this performance portability factor to use a code across NVIDIA and AMD GPUs could be to use CUDA and use hipify !!! CUDA is naturally better supported and definitely more stable - so there is that as well.
SYCL can target AMD as well with hipSYCL and has an experimental version for NVIDIA GPUs I think--SYCL has a learning curve, so I would think it would be too much to ask from an application developer to use SYCL from scratch but if a team has a software developer and/or RSE, and if the code has never seen a GPU before, it may be worthwhile spending the time to invest time in SYCL. I am not entirely up to date with the compiler support for SYCL, so that is something to check.
I cannot stop myself from recommending alpaka!!! haha:-)
alpaka is what we (ORNL-CAAR-PIConGPU project) use! GitHub - alpaka-group/alpaka: Abstraction Library for Parallel Kernel Acceleration 🦙 -- lists the backends supported by alpaka.
alpaka is say like kokkos but there are differences and I am sure both the communities will disagree with me for calling them similar but if you want to put a finger on - I would say alpaka is the european version of kokkos.
Rene from alpaka says - an ugly way of supporting native CUDA + HIP( CUDA + AMD could be done by using a macro type stuff like this
https://github.com/alpaka-group/alpaka/blob/2707e3bb8b9abdd0404e0775606a666dc6082c74/include/alpaka/core/Hip.hpp#L40
SO, I guess in summary, HIP is probably a good direction to go, imho, especially now, if CUDA -> hipify is not a direction chapel wants to consider.
Thank you Sunita!
Thus we are going to start looking into HIP and will comment with some of our observations here. We would be happy to hear from others their thoughts as well.