16716, “bradcray”, “Asymmetry with ‘chpl_nodeID’ with ‘forall’ vs. ‘coforall’”, “2020-11-16T20:27:19Z”
A colleague pointed out that for the following code running on two nodes:
writeln("coforall: ");
coforall loc in Locales {
on loc {
writeln("chpl_nodeID: ", chpl_nodeID, ", here.id = ", here.id);
}
}
writeln();
writeln("forall: ");
forall loc in Locales {
on loc {
writeln("chpl_nodeID: ", chpl_nodeID, ", here.id = ", here.id);
}
}
the output is inconsistent between the two loop forms:
coforall:
chpl_nodeID: 0, here.id = 0
chpl_nodeID: 1, here.id = 1
forall:
chpl_nodeID: 0, here.id = 0
chpl_nodeID: 0, here.id = 1
My first reaction was "chpl_nodeID
isn’t really intended to be a user-facing feature, and is only used for bootstrapping, so maybe we don’t really need to worry about this." But while I think the first part of that statement is true, we rely on chpl_nodeID
a lot in library code, which makes it slightly concerning. And the fact that I can’t explain what’s happening is concerning.
Here’s what I (think I) know:
-
though
chpl_nodeID
is a fairly special SPMD-style / per-node C variable, Chapel doesn’t really know this. It’s declared as anextern var
of typeint
and from what I’ve seen, the compiler doesn’t seem to special-case it. -
I think the forall loop is arguably doing the correct thing in that
chpl_nodeID
is a global integer variable, and so is subject to having aconst in
shadow variable being inserted for it. Since that shadow variable is inserted on locale 0, it makes sense thatchpl_nodeID
would be 0 when printed from either locale. -
Putting in an explicit
ref
intent for theforall
loop seems to confirm this, resulting in the 0 / 1 values being printed.forall loc in Locales with (ref chpl_nodeID) { ...
-
So, I’m confused by why the coforall loop doesn’t seem to insert the similar shadow variable and get the same output (and, if it did, would this break all of the library code that relies on reasoning about chpl_nodeID?)
-
Also weird: I’d expect that putting a
with (const in chpl_nodeID)
into thecoforall
loop would symmetrically result in the same behavior as theforall
loop, yet it doesn’t. -
I was expecting that the coforall+on optimization might be playing a role here, yet inserting a
writeln()
before theon
-clause within the coforall doesn’t change the behavior.
All of this makes me suspicious that we have some sort of bug or inconsistency in our implementation, though I’m not sure what it is. It also makes me believe that we should remove chpl_nodeID
since it doesn’t behave like a normal Chapel variable, and rely on something like a primitive that returns the current node ID instead. This also makes me curious to understand better when/why chpl_nodeID
is used in libraries and what it would take to rewrite those to only use user-facing features.