New Issue: always inject AMs with comm=ofi

19205, "jhh67", "always inject AMs with comm=ofi", "2022-02-07T20:51:57Z"

The current AM implementation with comm=ofi calls fi_sendmsg to send blocking AMs. The call doesn't return until the message has been successfully sent, which in the case of delivery complete means the AM has been received by the target. This is overkill, as with a blocking AM the sender will immediately wait for a "done" flag to be set by the target indicating that the AM has completed. Calling fi_inject (or passing the FI_INJECT flag to fi_sendmsg) will return as soon as data buffer is no longer needed by libfabric, saving a network roundtrip under delivery complete. This should lead to higher performance. Non-blocking AMs are already injected because no ordering is implied. Note that this logic also means that we don't need the network to provide "send-after-send" ordering (SAS), which we currently request. Both of our important providers, cxi and efa, provide SAS ordering, but we should experiment with not requesting it and see if perhaps there is a performance impact. At the very least we probably should not ask for an ordering that we don't need.