[Chapel Merge] Generate both GPU and CPU versions of outlined loo

Branch: refs/heads/main
Revision: 8b7a18f
Author: daviditen
Log Message:

Merge pull request #18308 from daviditen/clone-gpu-loops

Generate both GPU and CPU versions of outlined loops

[reviewed by @e-kayrakli and @gbtitus]

When outlining loops for GPU, make two copies - the outlined one for GPU and
the original for CPU. Add a conditional that checks if the code is currently
running on the GPU and run the outlined code in that case, and run the original
loop otherwise.

Add a new primitive PRIM_GET_REQUESTED_SUBLOC that is generated as a call to
the runtime function chpl_task_getRequestedSubloc() for use in determining
if it is on GPU or not.

If the locale model has GPUs work around issue Cray/chapel-private#2413
by getting the requested sublocale instead of always using "any".

Update the gpu/native/jacobi test to run once on GPU and once on CPU.

Signed-off-by: David Iten daviditen@users.noreply.github.com

Modified Files:
M compiler/AST/primitive.cpp

M compiler/codegen/cg-expr.cpp
M compiler/include/primitive_list.h
M compiler/optimizations/deadCodeElimination.cpp
M compiler/optimizations/optimizeOnClauses.cpp
M compiler/util/exprAnalysis.cpp
M test/gpu/native/jacobi/jacobi.chpl
M test/gpu/native/jacobi/jacobi.good

Compare: https://github.com/chapel-lang/chapel/compare/a006cbcc3cc0...8b7a18f1e3bd