New Issue: Same type or different types for serial, parallel, distributed flavors of a data structure

vasslitvinov · March 31, 2022, 8:28pm

19579, "vasslitvinov", "Same type or different types for serial, parallel, distributed flavors of a data structure", "2022-03-31T20:27:37Z"

github.com/chapel-lang/chapel

Same type or different types for serial, parallel, distributed flavors of a data structure

opened 08:27PM - 31 Mar 22 UTC

vasslitvinov

related: #18097 #18494 #18095 What are the pros and cons of providing a seria…l vs. parallel vs. distributed versions of a data structure (ex. map) within a single Chapel type vs. different types? Firstly, how will the choice matter to the user of the data structure, other than conceptually? Here are the various scenarios: * creating a new map * invoking a method on a map * declaring the type of a variable or formal What will need to be adjusted when going from the serial to distributed flavor? * creating a new map: something will need to change; this is regardless of the choice of a single vs. different types * invoking a method on a map: if the method is distributed-friendly already, its invocation will not need to change, otherwise it will; this type of change is the same regardless of the choice of a single vs. different types * declaring the type of a variable or formal: the only impact is the "different types" option takes away the ability to write a type that is generic over the serial vs. distributed aspect How will the choice matter to the author of the data structure? * pro for "a single type" : easy to factor out common code * pro for "different types" : easy to structure the code that is not common Most of our existing data structures use the "single type" approach to choose between the serial and parallel flavors using the `parSafe` param. This works well because the implementation shares almost all code between the two flavors. The opportunities to share code with a distributed flavor are much lower because a distributed data structure most likely will contain just a privatization ID and a pointer to a privatized class. So the "single type" approach will add unwanted complications to the implementation. We may also want to do better than the current approach "the parallel version = the serial version + locking" -- by introducing parallel implementations that are designed to be parallel from the beginning, such as `modules/packages/ConcurrentMap.chpl`. The benefits of the "single type" approach for the implementors will be greatly reduced if we switch. Now let's see what each option could look like. Consider just two flavors for brevity. ```chpl /* Option (1-combined): a single type, with all fields declared in that type */ record map { param kind: MapKind; type keyType; type valType; private var table: ifSerial(chpl__hashtable(keyType, valType)); private var _value: ifDistrib(unmanaged DistributedMap(keyType, valType)); ... and so on ... var hashFn; // the default or custom hash function proc ifSerial(type t) type return if kind==MapKind.serial then t else nothing; proc ifDistrib(type t) type return if kind==MapKind.distributed then t else nothing; proc init(param kind, type keyType, type valType)... proc init(param kind, type keyType, type valType, hashFn)... } proc map.serialOnlyMethod(...)... proc map.serialOrDistributedMethod(...) { if kind==MapKind.serial { serial implementation } else { distributed implementation } } // scenarios for evolution of user code from serial to distributed var myMap = new map(MapKind.serial, ...); //require the `kind` argument upfront? → var myMap = new map(MapKind.distributed, ...); proc createMap(param kind, ...) return new map(kind, ...); proc useMe1(arg: map)... proc useMe2(arg: map(keyType=..., valType=...))... // only useMe3 needs adjustment proc useMe3(arg: map(MapKind.serial, ...))... → proc useMe3(arg: map(MapKind.distributed, ...))... myMap.serialOnlyMethod(...); → myMap.serialOrDistributedMethod(...); /* Option (1-forwarding): a single type, with a single forwarding field This option differs in how the implementation is structured. User code evolution scenarios are the same as for (1-combined). */ record map { param kind: MapKind; type keyType; type valType; private forwarding var impl; proc init(param kind, type keyType, type valType) { if kind==MapKind.serial then impl = new serialMap(...); else impl = new distributedMap(...); } proc init(param kind, type keyType, type valType, hashFn) { ... like above ... } // must define these to avoid getting the default implementations proc init=(...)... proc readWriteThis(ch)... } // Could/should these also be available to the user directly? // Note that `new serialMap()` would be different from `new map(MapKind.serial)` record serialMap {...} record distributedMap {...} proc serialMap.serialOnlyMethod(...)... proc serialMap.serialOrDistributedMethod(...) { serial implementation } proc distributedMap.serialOrDistributedMethod(...) { distributed implementation } /* Option (2): multiple types. The `map` type-returning function tries to simplify switching between the flavors. */ record serialMap {...} record distributedMap {...} proc serialMap.serialOnlyMethod(...)... proc serialMap.serialOrDistributedMethod(...) { serial implementation } proc distributedMap.serialOrDistributedMethod(...) { distributed implementation } // the returned type is generic over keyType etc. proc map(param kind: MapKind) type return if kind==MapKind.serial then serialMap else distributedMap; // the returned type should probably be concrete, using the default hashFn proc map(param kind: MapKind, type keyType, type valType) type return if kind==MapKind.serial then serialMap(keyType, valType) else distributedMap(keyType, valType); // the returned type could be concrete proc map(param kind: MapKind, type keyType, type valType, hashFn) type return if kind==MapKind.serial then serialMap(keyType, valType, hashFn.type) else distributedMap(keyType, valType, hashFn.type); // note: can't have the return type be generic over MapKind // scenarios for evolution of user code from serial to distributed var myMap1 = new map(MapKind.serial, string, int); → var myMap1 = new map(MapKind.distributed, string, int); // passing a custom hashFn does not work well with map() var myMap2 = new map(MapKind.serial, string, int, ???); → var myMap2 = new map(MapKind.distributed, string, int, ???); // this works more smoothly var myMap3 = new serialMap(...); → var myMap3 = new distributedMap(...); proc createMap(param kind, ...) return if kind==MapKind.serial then new serialMap(...) else new distributedMap(...); // wishing for proc useMe1(arg: map) -- `map` would need to be an interface proc useMe1(arg) where isMap(arg) ... // ditto proc useMe2(arg: map(keyType=..., valType=...)) proc useMe2(arg) where isMap(arg) && arg.keyType==... && valType==... ... proc useMe3(arg: serialMap)... → proc useMe3(arg: distributedMap)... myMap.serialOnlyMethod(...); → myMap.serialOrDistributedMethod(...); ```

related: #18097 #18494 #18095

What are the pros and cons of providing a serial vs. parallel vs. distributed versions of a data structure (ex. map) within a single Chapel type vs. different types?

Firstly, how will the choice matter to the user of the data structure, other than conceptually? Here are the various scenarios:

creating a new map
invoking a method on a map
declaring the type of a variable or formal

What will need to be adjusted when going from the serial to distributed flavor?

creating a new map: something will need to change; this is regardless of the choice of a single vs. different types
invoking a method on a map: if the method is distributed-friendly already, its invocation will not need to change, otherwise it will; this type of change is the same regardless of the choice of a single vs. different types
declaring the type of a variable or formal: the only impact is the "different types" option takes away the ability to write a type that is generic over the serial vs. distributed aspect

How will the choice matter to the author of the data structure?

pro for "a single type" : easy to factor out common code
pro for "different types" : easy to structure the code that is not common

Most of our existing data structures use the "single type" approach to choose between the serial and parallel flavors using the parSafe param. This works well because the implementation shares almost all code between the two flavors. The opportunities to share code with a distributed flavor are much lower because a distributed data structure most likely will contain just a privatization ID and a pointer to a privatized class. So the "single type" approach will add unwanted complications to the implementation.

We may also want to do better than the current approach "the parallel version = the serial version + locking" -- by introducing parallel implementations that are designed to be parallel from the beginning, such as modules/packages/ConcurrentMap.chpl. The benefits of the "single type" approach for the implementors will be greatly reduced if we switch.

Now let's see what each option could look like. Consider just two flavors for brevity.

/*
Option (1-combined): a single type, with all fields declared in that type
*/

record map {
  param kind: MapKind;
  type keyType;
  type valType;

  private var table: ifSerial(chpl__hashtable(keyType, valType));
  private var _value: ifDistrib(unmanaged DistributedMap(keyType, valType));
  ... and so on ...
  var hashFn;  // the default or custom hash function

  proc ifSerial(type t) type
    return if kind==MapKind.serial then t else nothing;
  proc ifDistrib(type t) type
    return if kind==MapKind.distributed then t else nothing;

  proc init(param kind, type keyType, type valType)...
  proc init(param kind, type keyType, type valType, hashFn)...
}

proc map.serialOnlyMethod(...)...
proc map.serialOrDistributedMethod(...) {
  if kind==MapKind.serial { serial implementation }
  else                    { distributed implementation }
}

// scenarios for evolution of user code from serial to distributed

var myMap = new map(MapKind.serial, ...); //require the `kind` argument upfront?
→ var myMap = new map(MapKind.distributed, ...);

proc createMap(param kind, ...) return new map(kind, ...);

proc useMe1(arg: map)...
proc useMe2(arg: map(keyType=..., valType=...))...
// only useMe3 needs adjustment
proc useMe3(arg: map(MapKind.serial, ...))...
→ proc useMe3(arg: map(MapKind.distributed, ...))...

myMap.serialOnlyMethod(...);
→ myMap.serialOrDistributedMethod(...);

/*
Option (1-forwarding): a single type, with a single forwarding field

This option differs in how the implementation is structured.
User code evolution scenarios are the same as for (1-combined).
*/

record map {
  param kind: MapKind;
  type keyType;
  type valType;

  private forwarding var impl;

  proc init(param kind, type keyType, type valType) {
    if kind==MapKind.serial
    then impl = new serialMap(...);
    else impl = new distributedMap(...);
  }

  proc init(param kind, type keyType, type valType, hashFn) {
    ... like above ...
  }

  // must define these to avoid getting the default implementations
  proc init=(...)...
  proc readWriteThis(ch)...
}

// Could/should these also be available to the user directly?
// Note that `new serialMap()` would be different from `new map(MapKind.serial)`
record serialMap {...}
record distributedMap {...}

proc serialMap.serialOnlyMethod(...)...
proc serialMap.serialOrDistributedMethod(...) { serial implementation }
proc distributedMap.serialOrDistributedMethod(...) { distributed implementation }

/*
Option (2): multiple types.

The `map` type-returning function tries to simplify switching between
the flavors.
*/

record serialMap {...}
record distributedMap {...}

proc serialMap.serialOnlyMethod(...)...
proc serialMap.serialOrDistributedMethod(...) { serial implementation }
proc distributedMap.serialOrDistributedMethod(...) { distributed implementation }

// the returned type is generic over keyType etc.
proc map(param kind: MapKind) type
  return if kind==MapKind.serial then serialMap else distributedMap;

// the returned type should probably be concrete, using the default hashFn
proc map(param kind: MapKind, type keyType, type valType) type
  return if kind==MapKind.serial
  then serialMap(keyType, valType)
  else distributedMap(keyType, valType);

// the returned type could be concrete
proc map(param kind: MapKind, type keyType, type valType, hashFn) type
  return if kind==MapKind.serial
  then serialMap(keyType, valType, hashFn.type)
  else distributedMap(keyType, valType, hashFn.type);

// note: can't have the return type be generic over MapKind

// scenarios for evolution of user code from serial to distributed

var myMap1 = new map(MapKind.serial, string, int);
→ var myMap1 = new map(MapKind.distributed, string, int);

// passing a custom hashFn does not work well with map()
var myMap2 = new map(MapKind.serial, string, int, ???);
→ var myMap2 = new map(MapKind.distributed, string, int, ???);

// this works more smoothly
var myMap3 = new serialMap(...);
→ var myMap3 = new distributedMap(...);

proc createMap(param kind, ...)
  return if kind==MapKind.serial
  then new serialMap(...)
  else new distributedMap(...);

// wishing for proc useMe1(arg: map) -- `map` would need to be an interface
proc useMe1(arg) where isMap(arg) ...

// ditto proc useMe2(arg: map(keyType=..., valType=...))
proc useMe2(arg) where isMap(arg) && arg.keyType==... && valType==... ...

proc useMe3(arg: serialMap)...
→ proc useMe3(arg: distributedMap)...

myMap.serialOnlyMethod(...);
→ myMap.serialOrDistributedMethod(...);

Topic		Replies	Views
Announcing Chapel 1.24.0! Announcements	0	549	March 18, 2021
Announcing Chapel 1.22.0! Announcements	0	244	April 16, 2020
Announcing Chapel 1.28.0! Announcements	0	323	September 16, 2022
Announcing Chapel version 1.19! Announcements	0	233	March 22, 2019
Comparing Standard Library Sorts: The Impact of Parallelism Blog	3	134	February 2, 2024

New Issue: Same type or different types for serial, parallel, distributed flavors of a data structure

Related Topics