20164, "ronawho", "How to efficiently store distributed arrays in classes?", "2022-07-06T16:54:31Z"
opened 04:54PM - 06 Jul 22 UTC
This is based off of Arkouda's SymEntry class, which is a wrapper about arrays. … In working on https://github.com/Bears-R-Us/arkouda/issues/1398 I found that there was a surprising amount of communication to privatize and reprivatize when creating a new SymEntry from an existing array.
The SymEntry class looks roughly like:
```chpl
proc makeDistDom(size:int) {
use BlockDist;
return {0..#size} dmapped Block(boundingBox={0..#size});
}
proc makeDistArray(size:int, type etype) {
var a: [makeDistDom(size)] etype;
return a;
}
class SymEntry {
type etype;
const size: int;
const aD: makeDistDom(size).type;
var a: [aD] etype;
proc init(size: int, type etype) {
this.etype = etype;
this.size = size;
this.aD = makeDistDom(size);
// this.a uses default initialization
}
proc init(a: [?D] ?etype) {
this.etype = etype;
this.size = a.size;
this.aD = D;
this.a = a;
}
}
```
Where ideally creating a new SymEntry will result in privatization for the distribution, domain, and array just once each. And for creating a SymEntry from an existing array I think we'd like to reuse the existing distribution and domain and only create and privatize a new array. Even more ideally if `a` was dead after passing it to the initializer it'd be great to just steal the array/memory and avoid having to create anything new.
Today though when creating a SymEntry from an existing array, I believe we're creating a new domain/distribution and privatizing from the `const aD: makeDistDom(size).type;` then `this.aD = D;` results in reprivatization and so does `this.a = a;`
---
There are some other quirks we've found here. In some places Arkouda creates a distributed SymEntry from a local DefaultRectangular array and it's not obvious how that works today. In trying to optimize out some of the array creation I was looking to stop storing the domain and just have `var a = makeDistArray(size, etype);`, but this broke the cases where a non-block array was passed to the initializer. For some reason changing that to `var a: makeDistArray(size, etype).type;` resolved it, but it's not obvious if that's for any good reason or we're just missing checks for that type of declaration.
And in other cases for arkouda segmented strings that are composed of two Arrays/SymEntrys tuples are often returned and passed in so https://github.com/chapel-lang/chapel/issues/18077 is likely important for reducing additional copies.
This is based off of Arkouda's SymEntry class, which is a wrapper about arrays. In working on DataFrame Display Server-Side Performance Issue · Issue #1398 · Bears-R-Us/arkouda · GitHub I found that there was a surprising amount of communication to privatize and reprivatize when creating a new SymEntry from an existing array.
The SymEntry class looks roughly like:
proc makeDistDom(size:int) {
use BlockDist;
return {0..#size} dmapped Block(boundingBox={0..#size});
}
proc makeDistArray(size:int, type etype) {
var a: [makeDistDom(size)] etype;
return a;
}
class SymEntry {
type etype;
const size: int;
const aD: makeDistDom(size).type;
var a: [aD] etype;
proc init(size: int, type etype) {
this.etype = etype;
this.size = size;
this.aD = makeDistDom(size);
// this.a uses default initialization
}
proc init(a: [?D] ?etype) {
this.etype = etype;
this.size = a.size;
this.aD = D;
this.a = a;
}
}
Where ideally creating a new SymEntry will result in privatization for the distribution, domain, and array just once each. And for creating a SymEntry from an existing array I think we'd like to reuse the existing distribution and domain and only create and privatize a new array. Even more ideally if a
was dead after passing it to the initializer it'd be great to just steal the array/memory and avoid having to create anything new.
Today though when creating a SymEntry from an existing array, I believe we're creating a new domain/distribution and privatizing from the const aD: makeDistDom(size).type;
then this.aD = D;
results in reprivatization and so does this.a = a;