New Issue: Reduce privatization overheads

20197, "ronawho", "Reduce privatization overheads", "2022-07-12T12:58:23Z"

https://github.com/chapel-lang/chapel/pull/15049 significantly improved the speed of creating distributed domains/arrays by allowing bulk transfer for types used as part of privatization. However in How to efficiently store distributed arrays in classes? · Issue #20164 · chapel-lang/chapel · GitHub we've been seeing that there are still some non-trivial overheads for privatization that we'd like to reduce to speed up the creation of arrays (and in particular small arrays) in Arkouda.

Nil dereference for maps of domains · Issue #20167 · chapel-lang/chapel · GitHub is motivated by wanting to cache distributed domains and How to efficiently store distributed arrays in classes? · Issue #20164 · chapel-lang/chapel · GitHub touches on how to efficiently store distributed arrays in classes, but even if both of those were addressed we'll still at some point have to create the initial distributed domain/distribution to cache and create distributed arrays so I think speeding them up is worthwhile.

Note that we are considering getting rid of privatization in favor of remote-value-forwarding in https://github.com/Cray/chapel-private/issues/2805, but that seems far-ish off and I think spending a little time optimizing privatization will give us a good baseline to evaluate the RVF approach.

See https://github.com/Cray/chapel-private/issues/521 and Distributed array creation performance · Issue #14132 · chapel-lang/chapel · GitHub, which have some old numbers and experiments for privatization. See also https://github.com/Cray/chapel-private/issues/503 about abandoning the binary tree used in privatizing domains/array.

I'd probably approach this by creating a simple comm/perf test to explore what's involved in creating a block distribution, block domain, and block array. Then see if we can replace the binary-tree used for privatization with a standard coforall loc in Locales and arrange for the privatization payload to just get send along with the on-stmt bundle.

The other related thing here is that we're seeing reprivatization occur more often than I'd expect, so beyond just making privatization faster it may be worth exploring if there's code paths we can optimize to avoid reprivatizing.