New Issue: Figure out a story for gasnet-ibv multi-rail

18255, "ronawho", "Figure out a story for gasnet-ibv multi-rail", "2021-08-19T15:08:29Z"

At a high level some InfiniBand systems (including Summit) have multiple rails and getting peak bandwidth requires communicating over all rails. By default gasnet will only use a single rail, but there's a way to opt-in to multi-rail support. This support can introduce other overheads so it's not necessarily something we want to enable by default either, but currently enabling it requires changing gasnet configure time options, which isn't very friendly to Chapel users.

Some more details from the GASNet ibv README:

Terminology:

This document will use "NIC" (short for Network Interface Card) to refer to the physical object installed in a host and "connector" to refer to the external connections from the NIC to a network.

The term "HCA" (short for Host Channel Adapter) will be used to refer to a device as enumerated by ibv_get_device_list() or the command-line utility ibv_devices. The HCA is the device driver's logical representation of the NIC, but there is not always a one-to-one correspondence, as described next.

When a NIC has multiple connectors, the driver may present these either as a single HCA with multiple "ports" or as multiple single-port HCAs. Additionally, some systems will present more than one HCA per connector. This is typically done on systems where the NIC is connected to multiple I/O buses. On a compute node of the Summit system at OLCF, there are two external network cable connectors on a single NIC which is connected internally to two I/O buses. The driver presents four HCAs, one for each combination of external connector and internal I/O bus. So, for a single NIC with two connectors there are at least three ways the system may present the same resources: with 1, 2 or 4 HCAs.

Multi-rail:

By default, ibv-conduit will use only the first active port on the first active InfiniBand Host Channel Adapter (HCA). However, if more than one HCA port is enabled for use, ibv-conduit will stripe communications over them.

The use of multiple ports or multiple adapters will yield increases in both bandwidth (good) and software overhead (bad). How the resulting trade off works for a given application may be hard to predict.

Our own nightly 16-node-cs-hdr machine has dual rail for the CascadeLake partition. The last time I ran correctness testing with GASNet-EX 2021.3.0 I saw sporadic failures. Using fenced puts (see GASNET_USE_FENCED_PUTS in the gasnet README) did not help the situation.

I think we need to understand the cause of these correctness regressions and evaluate the performance impact of using multiple rails. Once that is done we'll need to figure out a more friendly user-facing way to have Chapel users opt in to using multiple rails.

I should also note that by default GASNet will warn if it detects multiple rails, but the user hasn't opted into that. Given the current correctness issues we requested a way to quiet this warning in Bug 4246 – RFE: better behavior of single-rail build on multi-rail system and are planning to quiet the warning for our users until this issue is addressed.