[Chapel Merge] Allow forcing eager connections with comm=ofi.

Branch: refs/heads/master
Revision: 75a7e3d
Author: gbtitus
Log Message:

Merge pull request #17456 from gbtitus/ofi-connect-eagerly

Allow forcing eager connections with comm=ofi.

(Reviewed by @ronawho.)

With providers that do dynamic endpoint connection we have seen fairly
dramatic connection overheads under certain circumstances. As an aid to
performance analysis, here allow for forcing the endpoint connections to
be established eagerly, during startup, rather than later as needed,
when it might interfere with performance timings. This is controlled by
an environment variable (CHPL_RT_COMM_OFI_CONNECT_EAGERLY, default
false) and is only done for the tcp and verbs providers which do dynamic
connection.

Using this on 8 36-core Broadwell nodes of an IB-based CS system, with
the tcp provider the stream benchmark reported 130 GB/s and 600 GB/s for
regular and eager connection, respectively, and with the verbs provider
it reported 99 GB/s and 580 GB/s, respectively.

Note that using this feature has two side effects. The first is that
program startup time is increased. The second is that programs which
use eager connection may need their open file limit raised. Connections
require one or more open (socket) files each. On every node, eager mode
establishes connections from every transmit context (typically one per
core) on every other node. This may be many more connections than a
program actually uses (ra-on does use that many because it communicates
from every core to every remote node; hello and stream certainly do
not).

Modified Files:
M runtime/src/comm/ofi/comm-ofi.c

Compare: https://github.com/chapel-lang/chapel/compare/4d908008f59d...75a7e3daaff5