18532, "stonea", "Make GPU block size configurable on per-kernel launch basis", "2021-10-07T18:34:45Z"

Currently, block size for GPU kernel launches are set using the `CHPL_GPU_BLOCK_SIZE`

environment variable (or defaults to 512 if not set). This forces all kernel launches to use the same block size, which is not ideal.

Perhaps we should have some intelligence to automatically determine a good block size; the `cudaOccupancyMaxPotentialBlockSize()`

function may be useful in achieving this.

Nevertheless, users may want to explicitly be able to control block\grid size, so it would be good if there was some mechanism to do this.

Trying to brainstorm ideas, here's what I can come up with:

#### New syntax on the loop:

```
on gpuLocale {
forall i = 1..n with blocksize(5) { ... } }
forall i = 1..n with blocksize(10) { ... } }
}
```

#### New syntax on the 'on' statement:

```
on gpuLocale with blockSize(5) {
forall i = 1..n with blocksize(5) { ... }
}
on gpuLocale with blockSize(10) {
forall i = 1..n with blocksize(5) { ... }
}
```

#### Annotations on the loop:

```
on gpuLocale {
@gpuBlockSize(5)
forall i = 1..n { ... }
@gpuBlockSize(10)
forall i = 1..n { ... }
}
```

#### Annotations on the 'on' statement:

```
@gpuBlockSize(5)
on gpuLocale {
forall i = 1..n { ... }
}
@gpuBlockSize(10)
on gpuLocale {
forall i = 1..n { ... }
}
```

#### Mutable value on locale:

```
gpuLocale.blockSize = 5; on gpuLocale { forall i = 1..n { ... } }
gpuLocale.blockSize = 10; on gpuLocale { forall i = 1..n { ... } }
```

#### Make it part of the range\domain for the loop:

```
var d1 = {1..n}; d1.gpuBlockSize = 5;
var d2 = {1..n}; d2.gpuBlockSize = 5;
on gpuLocale { forall i = d1 { ... } }
on gpuLocale { forall i = d2 { ... } }
```

## Related issues:

This issue (how to mark loops as order independent without adding qthreads-tasks · Issue #16404 · chapel-lang/chapel · GitHub) proposes ways

of marking forall\foreach loops as not needing to add qthread-tasks.

One of its proposed solutions is too add a new "loop configuration syntax" to forall\foreach loops.