Branch: refs/heads/main
Revision: 4e0bc2e
Author: gbtitus
Link: Unavailable
Log Message:
Merge pull request #19127 from gbtitus/ugni-mr-check-xfer-extents
In comm=ugni, sanity-check transfer extents against their apparent MRs.
(Reviewed by @ronawho.)
We've been assuming in comm=ugni that if the starting address of a
transfer was found to be within some memory registration, there were
sufficient constraints elsewhere in the Chapel software stack to
guarantee the whole extent of the transfer would lie within that MR. We
still believe that should be true for Chapel itself, but we've recently
encountered a situation in which not-strictly-Chapel code made it false.
Specifically, turning on chunk merging and splitting in the jemalloc
memory allocation package caused us to end up with arrays that spanned
uGNI memory registration boundaries, leading to error returns from uGNI
that are hard to diagnose because the error code we get back is the
general "invalid parameter in API call". We've fixed that problem, but
as a follow-up, here adjust the comm layer to sanity-check the whole
extent of each transfer, including some slack for comm cache prefetches,
against the MR containing its starting address. This will give us a
more specific error message for this problem if it happens to recur in
the future.
Because this code turned out to be a bit more complicated than we'd
expected, this also adds the execution-time boolean environment variable
CHPL_RT_COMM_UGNI_DO_EXTENT_CHECKS which can be set to false to turn
these extent checks off if they seem to be producing false positives.
Modified Files:
M runtime/src/comm/ugni/comm-ugni.c
Compare: https://github.com/chapel-lang/chapel/compare/6702b173cfc8...4e0bc2e60ded