New Issue: barriers within foralls

16406, "mppf", "barriers within foralls", "2020-09-17T19:12:32Z"

Spin-off from #14405

A related idea to enable SPMD programming within a forall loop is to allow barriers in the forall loop body. These barriers can only be implemented efficiently if the rank of tasks within the forall loop is known.

See also #16405.

Once we have an idea of task IDs within foralls, it will make sense to barrier among currently running tasks. This kind of thing can be really important for GPU programming.

Here the thinking is that we would need a barrier for each level of task ID provided by #16405:

  • at least, barrier among the current vector lanes in a task (or GPU threads in a block)
  • potentially, barrier among the current tasks within a locale
  • potentially, barrier among all of the locales

Is it reasonable to add such barriers? How would they be expressed? Do they make sense at all of the levels in #16405?