25616, "bradcray", "Should we support 'CHPL_INTERCONNET' / 'CHPL_NETWORK'?", "2024-07-23T19:40:51Z"
Today, we support a CHPL_TARGET_PLATFORM
variable that sometimes tells us a lot about the target platform if it's something specific like an HPE Cray EX or Cray XC system, but sometimes tells us little if it's a Linux cluster. In the latter case, the user has to set CHPL_COMM_*
variables to specify how Chapel should map itself to the interconnect, using values like gasnet
or ofi
. In this issue, I'm wondering whether we should introduce a CHPL_INTERCONNECT
or CHPL_NETWORK
variable that would support values like none
, slingshot
, infiniband
, ethernet
, efa
, unset
, etc. as a higher-level way to say something about the target system that's higher-level and likely more known/knowable to a user than the details of how our communication is implemented. From there, we could then (typically) infer reasonable values for the lower-level CHPL_COMM*
related variables (while still permitting a user to set them explicitly, if desired).
For example, I might imagine that setting CHPL_TARGET_PLATFORM=hpe-apollo
would cause CHPL_INTERCONNECT
to be inferred to be infiniband
which would then cause CHPL_COMM
to be inferred to gasnet
and CHPL_COMM_SUBSTRATE
to be inferred to be ibv
(and so on). Yet on a Linux cluster that doesn't have a more specific platform identifier than linux64
, a user could set CHPL_INTERCONNECT=infiniband
and get the same lower-level settings. Or on an Apollo system, the user could override the default and set CHPL_COMM=ofi
if they wanted to try the ofi-based implementation.
To me, this seems like it would prevent most users from ever having to set CHPL_COMM
or its related variables, which feels like a win since that's more about how we implement things than about things a typical user would know, or should need to know.