[Chapel Merge] Switch away from tree-based privatization

Branch: refs/heads/main
Revision: 6eb4cff63ab036cbcd58df31a162016e1f68e95c
Author: Elliot Ronaghan
Link: Switch away from tree-based privatization by ronawho · Pull Request #20241 · chapel-lang/chapel · GitHub
Log Message:

Switch away from tree-based privatization
Switch from a manual tree-based privatization scheme to our normal coforall loc in Locales do on loc. The tree-based scheme was added way back in 8e63e2ba19, likely before coforall+on was optimized. A tree-based spawn can certainly be faster than the one-to-many spawn currently used by the coforall+on pattern, but coforall+on is what we use everywhere else and if it becomes a scaling bottleneck I think we want to make that tree-based instead of having point solutions for things like privatization. I don't have access to larger machines to run on at the moment, but some initial results showed better performance at 256 nodes of an XC and I see a ~2x speedup for creating a trivial block domain and array on 16 nodes of an XC and CS:

./distCreate --size=arraySize.tiny --dist=distType.block -nl 16

16-node-cs-hdr | config | domInit | arrInit | | ------ | -------: | -------: | | before | 0.00731s | 0.00106s | | after | 0.00464s | 0.00069s | 16-node-xc: | config | domInit | arrInit | | ------ | -------: | -------: | | before | 0.00110s | 0.00022s | | after | 0.00055s | 0.00013s | Resolves Cray/chapel-private#503 Part of #20197
Compare: Comparing 961ef2f6a3f7d652e034395b1c70052a87524a29...6eb4cff63ab036cbcd58df31a162016e1f68e95c · chapel-lang/chapel · GitHub
Diff: https://github.com/chapel-lang/chapel/pull/20241.diff
Modified Files:
modules/internal/ChapelArray.chpl,modules/internal/ChapelStandard.chpl,test/arrays/slices/commCounts/sliceBlockWithRanges.good,test/compflags/ferguson/print-module-resolution.good,test/deprecated/IO/localesForRegion.good,test/distributions/robust/arithmetic/performance/multilocale/alloc.block.good,test/distributions/robust/arithmetic/performance/multilocale/alloc.cyclic.good,test/distributions/robust/arithmetic/performance/multilocale/alloc.replicated.good,test/distributions/robust/arithmetic/performance/multilocale/alloc_all.block.good,test/distributions/robust/arithmetic/performance/multilocale/alloc_all.cyclic.good,test/distributions/robust/arithmetic/performance/multilocale/alloc_all.replicated.good,test/distributions/robust/arithmetic/performance/multilocale/assignReindex.block.good,test/distributions/robust/arithmetic/performance/multilocale/assignReindex.cyclic.good,test/distributions/robust/arithmetic/performance/multilocale/assignReindex.replicated.good,test/distributions/robust/arithmetic/performance/multilocale/assignSlice.block.good,test/distributions/robust/arithmetic/performance/multilocale/assignSlice.cyclic.good,test/distributions/robust/arithmetic/performance/multilocale/assignSlice.replicated.good,test/distributions/robust/arithmetic/performance/multilocale/reduceSlice.block.good,test/distributions/robust/arithmetic/performance/multilocale/reduceSlice.cyclic.good,test/distributions/robust/arithmetic/performance/multilocale/reduceSlice.replicated.good,test/distributions/robust/arithmetic/performance/multilocale/rvfSlices/assignReindex.block.good,test/distributions/robust/arithmetic/performance/multilocale/rvfSlices/assignReindex.cyclic.good,test/distributions/robust/arithmetic/performance/multilocale/rvfSlices/assignReindex.replicated.good,test/distributions/robust/arithmetic/performance/multilocale/rvfSlices/assignSlice.block.good,test/distributions/robust/arithmetic/performance/multilocale/rvfSlices/assignSlice.cyclic.good,test/distributions/robust/arithmetic/performance/multilocale/rvfSlices/assignSlice.replicated.good,test/distributions/robust/arithmetic/performance/multilocale/rvfSlices/reduceSlice.block.good,test/distributions/robust/arithmetic/performance/multilocale/rvfSlices/reduceSlice.cyclic.good,test/distributions/robust/arithmetic/performance/multilocale/rvfSlices/reduceSlice.replicated.good,test/distributions/robust/arithmetic/performance/multilocale/rvfSlices/reduceSlice2.good,test/distributions/robust/arithmetic/performance/multilocale/sliceOps.block.good,test/distributions/robust/arithmetic/performance/multilocale/sliceOps.cyclic.good,test/distributions/robust/arithmetic/performance/multilocale/sliceOps.replicated.good,test/modules/sungeun/init/printInitCommCounts.good,test/modules/sungeun/init/printModuleInitOrder.good,test/modules/vass/deinit-order-modules.verbose.good,test/scan/scanDiags.good,test/types/chplhashtable/recordOfNonHashRecord.good

Added Files:
test/classes/ferguson/check-init-array-classes-nil.chpl,test/classes/ferguson/check-init-array-classes-nil.good

Removed Files:
modules/internal/LocaleTree.chpl