New Issue: Same type or different types for serial, parallel, distributed flavors of a data structure

19579, "vasslitvinov", "Same type or different types for serial, parallel, distributed flavors of a data structure", "2022-03-31T20:27:37Z"

related: #18097 #18494 #18095

What are the pros and cons of providing a serial vs. parallel vs. distributed versions of a data structure (ex. map) within a single Chapel type vs. different types?

Firstly, how will the choice matter to the user of the data structure, other than conceptually? Here are the various scenarios:

  • creating a new map
  • invoking a method on a map
  • declaring the type of a variable or formal

What will need to be adjusted when going from the serial to distributed flavor?

  • creating a new map: something will need to change; this is regardless of the choice of a single vs. different types

  • invoking a method on a map: if the method is distributed-friendly already, its invocation will not need to change, otherwise it will; this type of change is the same regardless of the choice of a single vs. different types

  • declaring the type of a variable or formal: the only impact is the "different types" option takes away the ability to write a type that is generic over the serial vs. distributed aspect

How will the choice matter to the author of the data structure?

  • pro for "a single type" : easy to factor out common code

  • pro for "different types" : easy to structure the code that is not common

Most of our existing data structures use the "single type" approach to choose between the serial and parallel flavors using the parSafe param. This works well because the implementation shares almost all code between the two flavors. The opportunities to share code with a distributed flavor are much lower because a distributed data structure most likely will contain just a privatization ID and a pointer to a privatized class. So the "single type" approach will add unwanted complications to the implementation.

We may also want to do better than the current approach "the parallel version = the serial version + locking" -- by introducing parallel implementations that are designed to be parallel from the beginning, such as modules/packages/ConcurrentMap.chpl. The benefits of the "single type" approach for the implementors will be greatly reduced if we switch.

Now let's see what each option could look like. Consider just two flavors for brevity.

/*
Option (1-combined): a single type, with all fields declared in that type
*/

record map {
  param kind: MapKind;
  type keyType;
  type valType;

  private var table: ifSerial(chpl__hashtable(keyType, valType));
  private var _value: ifDistrib(unmanaged DistributedMap(keyType, valType));
  ... and so on ...
  var hashFn;  // the default or custom hash function

  proc ifSerial(type t) type
    return if kind==MapKind.serial then t else nothing;
  proc ifDistrib(type t) type
    return if kind==MapKind.distributed then t else nothing;

  proc init(param kind, type keyType, type valType)...
  proc init(param kind, type keyType, type valType, hashFn)...
}

proc map.serialOnlyMethod(...)...
proc map.serialOrDistributedMethod(...) {
  if kind==MapKind.serial { serial implementation }
  else                    { distributed implementation }
}

// scenarios for evolution of user code from serial to distributed

var myMap = new map(MapKind.serial, ...); //require the `kind` argument upfront?
→ var myMap = new map(MapKind.distributed, ...);

proc createMap(param kind, ...) return new map(kind, ...);

proc useMe1(arg: map)...
proc useMe2(arg: map(keyType=..., valType=...))...
// only useMe3 needs adjustment
proc useMe3(arg: map(MapKind.serial, ...))...
→ proc useMe3(arg: map(MapKind.distributed, ...))...

myMap.serialOnlyMethod(...);
→ myMap.serialOrDistributedMethod(...);

/*
Option (1-forwarding): a single type, with a single forwarding field

This option differs in how the implementation is structured.
User code evolution scenarios are the same as for (1-combined).
*/

record map {
  param kind: MapKind;
  type keyType;
  type valType;

  private forwarding var impl;

  proc init(param kind, type keyType, type valType) {
    if kind==MapKind.serial
    then impl = new serialMap(...);
    else impl = new distributedMap(...);
  }

  proc init(param kind, type keyType, type valType, hashFn) {
    ... like above ...
  }

  // must define these to avoid getting the default implementations
  proc init=(...)...
  proc readWriteThis(ch)...
}

// Could/should these also be available to the user directly?
// Note that `new serialMap()` would be different from `new map(MapKind.serial)`
record serialMap {...}
record distributedMap {...}

proc serialMap.serialOnlyMethod(...)...
proc serialMap.serialOrDistributedMethod(...) { serial implementation }
proc distributedMap.serialOrDistributedMethod(...) { distributed implementation }

/*
Option (2): multiple types.

The `map` type-returning function tries to simplify switching between
the flavors.
*/

record serialMap {...}
record distributedMap {...}

proc serialMap.serialOnlyMethod(...)...
proc serialMap.serialOrDistributedMethod(...) { serial implementation }
proc distributedMap.serialOrDistributedMethod(...) { distributed implementation }

// the returned type is generic over keyType etc.
proc map(param kind: MapKind) type
  return if kind==MapKind.serial then serialMap else distributedMap;

// the returned type should probably be concrete, using the default hashFn
proc map(param kind: MapKind, type keyType, type valType) type
  return if kind==MapKind.serial
  then serialMap(keyType, valType)
  else distributedMap(keyType, valType);

// the returned type could be concrete
proc map(param kind: MapKind, type keyType, type valType, hashFn) type
  return if kind==MapKind.serial
  then serialMap(keyType, valType, hashFn.type)
  else distributedMap(keyType, valType, hashFn.type);

// note: can't have the return type be generic over MapKind

// scenarios for evolution of user code from serial to distributed

var myMap1 = new map(MapKind.serial, string, int);
→ var myMap1 = new map(MapKind.distributed, string, int);

// passing a custom hashFn does not work well with map()
var myMap2 = new map(MapKind.serial, string, int, ???);
→ var myMap2 = new map(MapKind.distributed, string, int, ???);

// this works more smoothly
var myMap3 = new serialMap(...);
→ var myMap3 = new distributedMap(...);

proc createMap(param kind, ...)
  return if kind==MapKind.serial
  then new serialMap(...)
  else new distributedMap(...);

// wishing for proc useMe1(arg: map) -- `map` would need to be an interface
proc useMe1(arg) where isMap(arg) ...

// ditto proc useMe2(arg: map(keyType=..., valType=...))
proc useMe2(arg) where isMap(arg) && arg.keyType==... && valType==... ...

proc useMe3(arg: serialMap)...
→ proc useMe3(arg: distributedMap)...

myMap.serialOnlyMethod(...);
→ myMap.serialOrDistributedMethod(...);