Merge pull request #18308 from daviditen/clone-gpu-loops

Generate both GPU and CPU versions of outlined loops

[reviewed by @e-kayrakli and @gbtitus]

When outlining loops for GPU, make two copies - the outlined one for GPU and
the original for CPU. Add a conditional that checks if the code is currently
running on the GPU and run the outlined code in that case, and run the original
loop otherwise.

Add a new primitive PRIM_GET_REQUESTED_SUBLOC that is generated as a call to
the runtime function chpl_task_getRequestedSubloc() for use in determining
if it is on GPU or not.

If the locale model has GPUs work around issue Cray/chapel-private#2413
by getting the requested sublocale instead of always using "any".

Update the gpu/native/jacobi test to run once on GPU and once on CPU.

Signed-off-by: David Iten daviditen@users.noreply.github.com

