[Chapel Merge] primitives for GPU shared memory and sync threads

Branch: refs/heads/main
Revision: 0758bb9
Author: stonea
Link: primitives for GPU shared memory and sync threads by stonea · Pull Request #18882 · chapel-lang/chapel · GitHub
Log Message:

Merge pull request #18882 from stonea/gpu_shared_memory_and_sync

primitives for GPU shared memory and sync threads

This PR is for https://github.com/Cray/chapel-private/issues/2771 and introduces two new primitives: gpu allocShared and gpu syncThreads, which respectively returns a pointer into the GPU's shared memory and synchronizes GPU threads (i.e. CUDA's __syncThreads() function).

The gpu allocShared primitives takes a single parameter, which must be a compile-time constant, of how many bytes to allocate for the buffer. At code generation time we generate an LLVM "global variable" for this buffer in the shared memory space and then return a pointer to this buffer.

See the attached test for an example of these primitives in action.

This is all very much a work-in-progress so the exact parameters\names and nature of these primitives are subject to change over time (especially once we have a clearer picture of how this will work on the language side of things).

[Reviewed by @e-kayrakli]

Modified Files:
A test/gpu/native/memory/sharedMemory.chpl

A test/gpu/native/memory/sharedMemory.compopts
A test/gpu/native/memory/sharedMemory.good
A test/gpu/native/memory/sharedMemory.prediff
M compiler/AST/primitive.cpp
M compiler/codegen/cg-expr.cpp
M compiler/next/include/chpl/uast/PrimOpsList.h
M compiler/optimizations/optimizeOnClauses.cpp

Compare: https://github.com/chapel-lang/chapel/compare/ebc4499b1bdd...0758bb9f716c