Forall Overhead

What is the overhead with the highly readable and very clear

forall i in 0..#N do
{
   ...
}

where N is much more than here.maxTaskPar compared to only ever starting maxTaskPar threads.

M = N / here.maxTaskPar;
```chapel
forall m in 0 .. #maxTaskPar do for i in m*M .. #M do
{
   ...
}
for i in maxTasksPar*T .. N-1 do // handle the tail
{
   ...
}

Thanks

Hi Damian —

The forall essentially does a similar rewrite as yours under the hood, and I would not expect there to be any appreciable overhead between the two forms for N >> maxTaskPar.

Some modest differences that make me say "esssentially" above are:

  • we don't use a cleanup loop, but simply give some of the tasks in the original loop floor(N/maxTaskPar) work and others ceil(N/maxTaskPar) work.
  • rather than using a forall for the parallel loop, we use a coforall since we want to create a task per iteration; we also apply some heuristics based on the dataPar* settings to determine how many tasks to create (but in the default settings for a top-level loop like this, it will come out as you've written it)
  • rather than using a for for the inner loop, we use a foreach to get any benefits from hardware-level parallelization (e.g., vectorization)

-Brad

Thanks heaps. That makes it very clear. Your wise words deserve t be part of the documentation. It also means I should look elsewhere for the overhead thst I see