[Chapel Merge] Don't force PUT visibility before sending AM done

Branch: refs/heads/master
Revision: d1d65d9
Author: gbtitus
Log Message:

Merge pull request #17821 from gbtitus/restore-msgOrder-AM-perf

Don't force PUT visibility before sending AM done in msgOrder mode.

(Reviewed by @ronawho.)

Comm=ofi performance testing on Cray CS systems has been showing a solid
regression in many-to-one remote fetching atomic performance since PR
17630, which added the message-order-fence MCM conformance mode. It
turns out this was due to a logic error introduced by a related rewrite
of the routine that sends 'done' indicators for Active Messages (AMs).
That change was to add a libfabric read operation to force visibility of
prior libfabric writes before sending the 'done' in message-order-fence
mode, which doesn't have ordered writes. The bug was that we also did
that ordering read in the existing message-order mode, where it was not
needed because that mode already has ordered writes including those for
'done' indicators. Here, adjust so that we do the ordering read in
message-order mode only if we have outstanding atomics, not writes.
(And note that this is effectively a no-op with the 'verbs;ofi_rxm'
provider which is our major user of this mode, because we don't do
native libfabric atomics when using that provider anyway.)

This restores the previous many-to-one fetching atomic performance in
comm=ofi,verbs testing on Cray CS systems.

Modified Files:
M runtime/src/comm/ofi/comm-ofi.c

Compare: https://github.com/chapel-lang/chapel/compare/7e877bf8f942...d1d65d9c0311