[Chapel Merge] Bring in aggregation optimizations we used in Arko

Branch: refs/heads/main
Revision: fa76a16
Author: ronawho
Log Message:

Merge pull request #18326 from ronawho/apply-arkouda-agg-optimizations

Bring in aggregation optimizations we used in Arkouda

[reviewed by @e-kayrakli]

Bring in "upstream" aggregation optimizations from arkouda:

  • Bears-R-Us/arkouda 732 -- increase buffer size on non-ugni systems
  • Bears-R-Us/arkouda 783 -- optimize copy with c_ptr, avoid false-sharing
  • Bears-R-Us/arkouda 812 -- further avoid false-sharing

On 16-node-cs-hdr (48-core CascadeLake) this improves our aggregated
Indexgather from 1950 MB/s/node to 2650 MB/s/node. See the Arkouda PRs
for more details.

Modified Files:
M modules/internal/ChapelAutoAggregation.chpl

Compare: https://github.com/chapel-lang/chapel/compare/ce6e4d763652...fa76a1629c8e