New Issue: Asymmetry with 'chpl_nodeID' with 'forall' vs. 'coforall'

16716, “bradcray”, “Asymmetry with ‘chpl_nodeID’ with ‘forall’ vs. ‘coforall’”, “2020-11-16T20:27:19Z”

A colleague pointed out that for the following code running on two nodes:

writeln("coforall: ");
coforall loc in Locales {
  on loc {
    writeln("chpl_nodeID: ", chpl_nodeID, ", here.id = ", here.id);
  }
}
writeln();

writeln("forall: ");
forall loc in Locales {
  on loc {
    writeln("chpl_nodeID: ", chpl_nodeID, ", here.id = ", here.id);
  }
}

the output is inconsistent between the two loop forms:

coforall: 
chpl_nodeID: 0, here.id = 0
chpl_nodeID: 1, here.id = 1

forall: 
chpl_nodeID: 0, here.id = 0
chpl_nodeID: 0, here.id = 1

My first reaction was "chpl_nodeID isn’t really intended to be a user-facing feature, and is only used for bootstrapping, so maybe we don’t really need to worry about this." But while I think the first part of that statement is true, we rely on chpl_nodeID a lot in library code, which makes it slightly concerning. And the fact that I can’t explain what’s happening is concerning.

Here’s what I (think I) know:

  • though chpl_nodeID is a fairly special SPMD-style / per-node C variable, Chapel doesn’t really know this. It’s declared as an extern var of type int and from what I’ve seen, the compiler doesn’t seem to special-case it.

  • I think the forall loop is arguably doing the correct thing in that chpl_nodeID is a global integer variable, and so is subject to having a const in shadow variable being inserted for it. Since that shadow variable is inserted on locale 0, it makes sense that chpl_nodeID would be 0 when printed from either locale.

  • Putting in an explicit ref intent for the forall loop seems to confirm this, resulting in the 0 / 1 values being printed.

    forall loc in Locales with (ref chpl_nodeID) {
    ...
    
  • So, I’m confused by why the coforall loop doesn’t seem to insert the similar shadow variable and get the same output (and, if it did, would this break all of the library code that relies on reasoning about chpl_nodeID?)

  • Also weird: I’d expect that putting a with (const in chpl_nodeID) into the coforall loop would symmetrically result in the same behavior as the forall loop, yet it doesn’t.

  • I was expecting that the coforall+on optimization might be playing a role here, yet inserting a writeln() before the on-clause within the coforall doesn’t change the behavior.

All of this makes me suspicious that we have some sort of bug or inconsistency in our implementation, though I’m not sure what it is. It also makes me believe that we should remove chpl_nodeID since it doesn’t behave like a normal Chapel variable, and rely on something like a primitive that returns the current node ID instead. This also makes me curious to understand better when/why chpl_nodeID is used in libraries and what it would take to rewrite those to only use user-facing features.