17657, "e-kayrakli", "[Proposal] Add a low-level 'Communication' module", "2021-05-07T19:05:57Z"
opened 07:05PM - 07 May 21 UTC
area: Libraries / Modules
type: Design
type: Feature Request
This issue proposes addition of a low-level communication module which initially…
allows point-to-point raw data copies.
### Module name: `Communication`
- It captures the main purpose of the module
- It can contain collectives in the future either as top-level functions or in a
submodule
- Alternatives I thought about:
- Comm
- I like this, but the user can also do `import Communication as Comm`
- Consistent with CommDiagnostics module, though it is not a stable
interface
- Multilocale
- Too general, doesn't tell much
- LowLevelCommunication or LowLevelComm
- Not sure if we need "LowLevel" in the name
- SPMD
- Not necessarily limited to SPMD-like idioms
### Interface
I propose we only have 1 function in the first step:
```chapel
proc copy(ref dst: ?t, const ref src: t, numElems: int, useCache=true)
```
which copies `numElems*numBytes(t)` bytes from the `src` address to `dst`,
regardless of whether `src`/`dst` being local/remote.
#### Name
- We don't want to have `get`/`put` as they only differ w.r.t. location
of source/destination. Let's not make the user worry about it.
- On similar things we use "copy":
- `c_memcpy`
- we call the compiler-driven aggregation "Automatic Copy Aggregation"
- We have the `UnorderedCopy` module (which we might incorporate into this
module in the future. See below)
- Arkouda's aggregators have a `copy` method
#### Arguments
- I think we should have `numElems` not `numBytes`
- `dst`/`src` is consistent with C's `memcpy` and sufficiently clear to me.
- I wouldn't oppose to `lhs`/`rhs`, too. But they are too specific to
assignments, which may not be how an MPI-minded programmer think about this
operation.
- `useCache`: I think we can do some overloading tricks to check for
`--cache-remote`, as we don't want `useCache=true` with `--no-cache-remote`.
Similar to controlling the caching behavior, we may need more kinds of `copy`s
in the future. See below.
- The contexts in which the need for this function came up has been about
primitive types, so far. We may want to extend to POD/non-POD types down the
road but current interface shouldn't block that from happening I think. See
below for more discussion about data types.
---
Some of the finer details are below. There are also some questions about the
future additions to this module. I don't think we have to answer any of those at
the moment. But it might help to have the bigger context handy when making
decisions
### How should we manage different kinds of copies?
In the near term we want to control:
- cached/uncached
In the longer term we might want to have
- unordered
- nonblocking
- strided
- aggregated (maybe? but probably it is going to be a separate thing)
I think we should expose these things via arguments rather than implementing
different functions. Though I have some uncertainties:
- nonblocking version might want to return a status object of sorts. Changing
the return type of the function based on an argument may not be a good idea.
The alternative would be adding something like `copyNB` or `nonBlockingCopy`
or something else. Another alternative that I can think of is to have
`Communication.NonBlocking` submodule.
Same question about nonblocking-ness can come up with collectives, when we add
them, too.
- strided version may require more than one argument, which may not be too
difficult to handle but it is something to consider.
- aggregated copies are probably going to be handled via the aggregator objects.
However, I mused about passing that object to the `copy` here as an
alternative instead of having that object have its own user-facing `copy`
method. See https://github.com/chapel-lang/chapel/issues/16963 for user-facing
aggregation discussion.
### How does this module interact with non-primitive types?
#### POD types
- I think we can extend to POD types. But with serialization and copy/move
semantics for user-defined records, things get murky.
For non-primitive types I see the following alternative paths for this module:
- Follow the semantics of corresponding scalar/array assignment for the given
type.
- Pro: consistent with language rules, widest coverage for this module
- Con: what if that assignment is not in bulk, or require non-O(1) local
operations? Doesn't it go against the nature of this module?
- Follow the semantics of corresponding scalar/array assignment for the given
type *if* that assignment can be handled with bulk transfer
- Pro: more guarantee towards O(1) operation/communication complexity?
- Con: understanding whether something is bulk transferrable may not be easy
for the user
- Limit only to primitive types (what I propose as the first step here)
- Pro: Safest to say "one call to `copy` is negligable local ops + 1 comm"
- Con: too limiting?
#### Serializable types
- I think we can do this, too. But we may want to think about bulk
serialization/deserialization first.
See https://github.com/chapel-lang/chapel/issues/15675, https://github.com/chapel-lang/chapel/issues/15676
#### Class types
- Currently, we only allow bulk copy of unmanaged instances (where we do a bulk
pointer copy), I think we can also extend this interface to those cases. I am
not sure how important that is. The reason we have added bulk copy of
unmanaged class arrays was to have faster distributed array initialization.
(Bulk class transfer is still very limited, because we only wanted it to work
for array creation in the past, and we haven't come across a case to make it
more general, yet. Though, I think it should be easy to do. See https://github.com/chapel-lang/chapel/pull/15049)
---
### Summary of future work
- I would like this module to subsume the `UnorderedCopy` package module. I
imagine we can do that by adding an argument to `copy`
- We should also think about moving `UnorderedAtomics` somewhere in this module
or to a sibling/child of sorts. Maybe something like
`Communication.UnorderedAtomics` ?
- We should think about non-blocking versions of the operations.
- We want to have some basic collectives for reduction, broadcast and barriers
- We'll explore the idea of task teams/groups as a high-level library concept
- User-facing aggregators should at least be consistent with this interface. But
IMO they can be addition to this module as a top-level concept or a submodule.
See https://github.com/chapel-lang/chapel/issues/16963
This issue proposes addition of a low-level communication module which initially
allows point-to-point raw data copies.
Module name: Communication
It captures the main purpose of the module
It can contain collectives in the future either as top-level functions or in a
submodule
Alternatives I thought about:
Comm
I like this, but the user can also do import Communication as Comm
Consistent with CommDiagnostics module, though it is not a stable
interface
Multilocale
Too general, doesn't tell much
LowLevelCommunication or LowLevelComm
Not sure if we need "LowLevel" in the name
SPMD
Not necessarily limited to SPMD-like idioms
Interface
I propose we only have 1 function in the first step:
proc copy(ref dst: ?t, const ref src: t, numElems: int, useCache=true)
which copies numElems*numBytes(t)
bytes from the src
address to dst
,
regardless of whether src
/dst
being local/remote.
Name
We don't want to have get
/put
as they only differ w.r.t. location
of source/destination. Let's not make the user worry about it.
On similar things we use "copy":
c_memcpy
we call the compiler-driven aggregation "Automatic Copy Aggregation"
We have the UnorderedCopy
module (which we might incorporate into this
module in the future. See below)
Arkouda's aggregators have a copy
method
Arguments
I think we should have numElems
not numBytes
dst
/src
is consistent with C's memcpy
and sufficiently clear to me.
I wouldn't oppose to lhs
/rhs
, too. But they are too specific to
assignments, which may not be how an MPI-minded programmer think about this
operation.
useCache
: I think we can do some overloading tricks to check for
--cache-remote
, as we don't want useCache=true
with --no-cache-remote
.
Similar to controlling the caching behavior, we may need more kinds of copy
s
in the future. See below.
The contexts in which the need for this function came up has been about
primitive types, so far. We may want to extend to POD/non-POD types down the
road but current interface shouldn't block that from happening I think. See
below for more discussion about data types.