[Chapel Merge] In comm=ofi, register data segment, heap, and stac

Branch: refs/heads/master
Revision: 6e5182f
Author: gbtitus
Log Message:

Merge pull request #17841 from gbtitus/ofi-register-dataseg

In comm=ofi, register data segment, heap, and stack in addition to fixed heap.

(Reviewed by @ronawho.)

In the ofi comm layer when we have a fixed heap, also register the
static data segment as well as the existing pages of the process's
original heap and stack. This allows communicating directly to and
from these regions via RMA instead of having to use AM-mediated PUTs
and GETs.

This can provide a significant performance benefit in some cases. In a
highly artificial setup for an unrelated investigation on an HPE Cray EX
system, with 4 Chapel locales per compute node and a total of 64 locales
(thus 16 nodes), this reduced the time needed for module initialization
from roughly 160s to roughly 1s, apparently as a result of allowing the
use of RMA instead of AM-mediated PUTs for broadcasting all the config
consts from locale 0 to the other locales. (However, note that by itself
this change cannot have caused the whole 160x benefit, because certainly
we cannot do RMA writes 160x faster than we can do AMs. So, much of the
apparent improvement was probably due to reducing the contention that
setup was designed to study in the first place.)

Much of what's here was adapted from the same functionality in the ugni
comm layer.

While here I also did some cleanup in the memory table and registration
code, which was pretty old and in particular had not been exercised with
more than one registered region in a very long time. I also fixed a minor
bug we'd never run into, in the ugni comm layer.

Modified Files:
M runtime/src/comm/ofi/comm-ofi-internal.h

M runtime/src/comm/ofi/comm-ofi.c
M runtime/src/comm/ugni/comm-ugni.c

Compare: https://github.com/chapel-lang/chapel/compare/651abd9156be...6e5182f6f1e2