New Issue: Implement parallel scans on array-like expressions without an array temp when possible

28930, "bradcray", "Implement parallel scans on array-like expressions without an array temp when possible", "2026-06-03T23:36:13Z"

[split off from Parallelize scans for shape-full expressions · Issue #12707 · chapel-lang/chapel · GitHub which has some notes and thoughts that may be of use in the comments]

In #28928, I parallelized scans over array-like expressions like A: int, sin(A), A.x, or [i in 1..10] i by storing the loop into an array and then computing the scan on it. This was a simple approach to implement and results in appropriate parallelism within the scan, but has the downside of requiring an extra array temp (where scans already allocate an additional array for storing the result).

This issue asks what it would take to parallelize such scans without the temp. I attempted this in the past by using the .shape property of iterator records to allocate the result array for the scan and then calling the normal array code for scans, but this turned out not to work because iterator records don't support indexing (required by the current scan implementation). The thought I had then (which holds up today) is that if iterator expressions were stored using more of a pseudo-array format that supported things like indexing, we could potentially get the best of all worlds. This concept is captured in Represent promotions / iteratorRecords over arrays as a pseudo-array? · Issue #28926 · chapel-lang/chapel · GitHub

This approach would likely work for cases like A: int or sin(A) where the indexing method on the pseudo-array could be implemented by calling the underlying array's indexing method and then casting to int or calling sin with the result; however for a general iterator like for i in 1..10 do readInt(i) it presumably wouldn't without creating an array to store results due to the inability to randomly access the input stream.

Alternatively, we could try to specially implement the scan operator for iteratorRecords, but my sense is that this would be very challenging since an iteratorRecord could represent a loop over a local or distributed array using any memory layout, and the scan operator typically has to be implemented in a way that's very cognizant of the underlying array storage order to work well/efficiently. So I worry that doing so would require specializing the implementation for all possible loop types we might get.