New Issue: [Patterns] distributed histogramming

lydia-duncan-github · January 25, 2022, 7:56pm

19102, "lydia-duncan", "[Patterns] distributed histogramming", "2022-01-25T19:55:28Z"

[Patterns] distributed histogramming

opened 07:55PM - 25 Jan 22 UTC

type: Design stat: Needs Design Review

This issue is part of a series of issues to design the interface for collections… across serial, parallel, and distributed contexts. Our goal is to determine what we think is reasonable to support and what doesn't seem useful. Additional patterns welcome (but it would be best to put them in their own issue for discussion, likely). In particular, this pattern was suggested by Michael. ```chapel // suppose hist is a distributed map, input is a distributed array forall key in input { hist[key] += 1; } ``` This could cause a race depending on the underlying implementation. Adding a new key in another thread could cause the underlying data structure to resize, and that could make the reference to `hist[key]` in this thread become invalidated. If we don't return a reference from `hist[key]`, it becomes a lot harder to implement a histogram using this map. If we don't allow resizing to impact already existing keys, we still could run into reference invalidation if a key is removed in a parallel setting (solutions 1 and 2 below would avoid that case). Michael had a few ideas for what to do if we don't return a reference from `hist[key]`: 1. We could pass in a lambda/function object to perform the update, something like `Count.update(key, lambda(x) { x += 1; } )` 2. We could use a context manager to wrap the update: `with Count.update(key) as val { val += 1; }` 3. We could add a lock per key and make the user deal with atomicity 4. We could add a feature like the one proposed in #12306

This issue is part of a series of issues to design the interface for collections across serial, parallel, and distributed contexts. Our goal is to determine what we think is reasonable to support and what doesn't seem useful. Additional patterns welcome (but it would be best to put them in their own issue for discussion, likely). In particular, this pattern was suggested by Michael.

// suppose hist is a distributed map, input is a distributed array
forall key in input {
  hist[key] += 1;
}

This could cause a race depending on the underlying implementation. Adding a new key in another thread could cause the underlying data structure to resize, and that could make the reference to hist[key] in this thread become invalidated.

If we don't return a reference from hist[key], it becomes a lot harder to implement a histogram using this map. If we don't allow resizing to impact already existing keys, we still could run into reference invalidation if a key is removed in a parallel setting (solutions 1 and 2 below would avoid that case).

Michael had a few ideas for what to do if we don't return a reference from hist[key]:

We could pass in a lambda/function object to perform the update, something like Count.update(key, lambda(x) { x += 1; } )
We could use a context manager to wrap the update: with Count.update(key) as val { val += 1; }
We could add a lock per key and make the user deal with atomicity
We could add a feature like the one proposed in #12306

Topic		Replies	Views	Activity
Block/stencil dists when simulating multi-locale in a single-local machine Users	12	421	July 7, 2021
DimensionalDist2D no support for arbitrary dimension Developers	2	131	January 17, 2024

New Issue: [Patterns] distributed histogramming

Related topics