[Chapel Merge] Optimize allLocalesBarrier by reducing communicati

Branch: refs/heads/main
Revision: 1285226
Author: ronawho
Log Message:

Merge pull request #17978 from ronawho/opt-allLocalesBarrier

Optimize allLocalesBarrier by reducing communication

[reviewed by @e-kayrakli]

The allLocalesBarrier was previously implemented using a distributed
field in a class. This was hitting the performance issue in #10160 where
any access of the distributed field was doing a GET of the class
instance first, which resulted in all remote tasks doing a GET from
locale 0. This really hurt performance, especially on InfiniBand systems
where communication injection is serialized.

This moves the distributed field out of the class into a global, which
for this case is fine since the allLocalesBarrier is a singleton global
anyways. This significantly improves the performance of the barrier by
eliminating needless communication. A comm test is added to lock in that
behavior too.

Comparing performance for performance/comm/barrier/empty-chpl-barrier
with 100,000 trials we significant improvements at small scale for
InfiniBand and non-trivial improvements for Aries. The following results
are on 16 nodes with 40 cores each:

config Aries IB
before 3.6s 61.4s
after 2.8s 4.4s

And on 512 nodes of a different Aries system with 36 cores per node:

config Aries
before 89.2s
after 5.1s

So at small scale we see large benefits for InfiniBand and on Aries at
large scale we also see big improvements. There's 2 factors here, on IB
communication is serialized, which makes the impact more dramatic and
even on Aries which has fast concurrent comm the all-to-one behavior
becomes a bottleneck at scale.

Modified Files:
M modules/packages/AllLocalesBarriers.chpl

M modules/standard/Barriers.chpl
M test/parallel/taskPar/sungeun/barrier/commDiags.chpl
M test/parallel/taskPar/sungeun/barrier/commDiags.comm-none.good
M test/parallel/taskPar/sungeun/barrier/commDiags.good
M test/parallel/taskPar/sungeun/barrier/commDiags.na-none.good

Compare: https://github.com/chapel-lang/chapel/compare/75a0eeea870e...128522633ad6