27186, "jabraham17", "Ensure that local blocks work properly with COMM=none and GPU code.", "2025-04-29T22:55:14Z"
In a recent performance investigation, @benharsh suggested using local blocks to prevent unnecessary wide pointers from being inserted by the compiler. While this worked great for the CHPL_COMM=gasnet CHPL_LOCALE_MODEL=flat case, it did not help the CHPL_COMM=none CHPL_LOCALE_MODEL=gpu case.
I think it was generally surprising that it did not help (at least to @bradcray, @e-kayrakli, @benharsh, and @stonea). There were some suggestions that it should "just work" and this may have been an oversight in the compiler. Using the example from that investigation, I tried a few different things in the compiler to see if I could force the compiler to change its output when local blocks where used around a call to a kernel.
@e-kayrakli and @stonea should correct me if I am wrong here, but I believe the compiler already relies on the fact that the body of a kernel is a local block with no communication.
Regardless, this issue is about making sure that local blocks work and do the right thing orthogonally with multi-locale code and GPU code. This may just be a matter of making sure the compiler doesn't try to remove the local blocks in the local+GPU case, or that might be more plumbing required.