New Issue: How to efficiently store distributed arrays in classes?

20164, "ronawho", "How to efficiently store distributed arrays in classes?", "2022-07-06T16:54:31Z"

This is based off of Arkouda's SymEntry class, which is a wrapper about arrays. In working on DataFrame Display Server-Side Performance Issue · Issue #1398 · Bears-R-Us/arkouda · GitHub I found that there was a surprising amount of communication to privatize and reprivatize when creating a new SymEntry from an existing array.

The SymEntry class looks roughly like:

proc makeDistDom(size:int) {
  use BlockDist;
  return {0..#size} dmapped Block(boundingBox={0..#size});
}

proc makeDistArray(size:int, type etype) {
  var a: [makeDistDom(size)] etype;
  return a;
}

class SymEntry {
  type etype;
  const size: int;
  const aD: makeDistDom(size).type;
  var a: [aD] etype;

  proc init(size: int, type etype) {
    this.etype = etype;
    this.size = size;
    this.aD = makeDistDom(size);
    // this.a uses default initialization
  }

  proc init(a: [?D] ?etype) {
    this.etype = etype;
    this.size = a.size;
    this.aD = D;
    this.a = a;
  }
}

Where ideally creating a new SymEntry will result in privatization for the distribution, domain, and array just once each. And for creating a SymEntry from an existing array I think we'd like to reuse the existing distribution and domain and only create and privatize a new array. Even more ideally if a was dead after passing it to the initializer it'd be great to just steal the array/memory and avoid having to create anything new.

Today though when creating a SymEntry from an existing array, I believe we're creating a new domain/distribution and privatizing from the const aD: makeDistDom(size).type; then this.aD = D; results in reprivatization and so does this.a = a;