18529, "stonea", "Redesign of GPU locale model", "2021-10-06T23:02:34Z"
When CHPL_LOCALE_MODEL=gpu Chapel will use a locale model where the GPU is accessed using here.getChild(n) where n > 0; here.getChild(0) will return the locale for the CPU. This means that there is no real difference between on Locale[i] and on Locale[i].getChild[0], which is kind of bizarre.
Maybe we should avoid having a sublocale 0 refer to the CPU?
This also raises interesting questions if we want to have a NUMA aware locale model as well as GPUs (sub)locales. One option would be to have the locale consist of a list of CPU sublocales followed by GPU sublocales. If there's a natural association of GPUs to CPUs we could also imagine having multiple levels to the hierarchy where GPU sublocales could be references form CPU sublocales.
Here's an illustration of these different options:
I'm not sure why we have the current design we do, but I imagine it follows from our implementation of wide pointers (see wide_ptr_s in runtime/include/chpltypes.h and chpl_localeID_t in `runtime/include/localeModels/gpu/chpl-locale-model.h') where we specify locales by a node ID and sublocale ID. Of course we can change this design if we want to or we could have the locale interface presented to the user abstract these details away.