Hi Damian —
There's no need to punctuate points that you want to make with an underhanded (or blatant) insult. You asked a very broad question, and Engin answered it. You're obviously not happy with the status quo or implications of the answer, but that's no need to react insultingly.
We're well aware that slicing has a long history, but Chapel has a feature-rich, aggressive language design that's doesn't have the luxury of being implemented in terms of those previous solutions, so we must re-invent them. Moreover, because differences in our needs and design goals (e.g., supporting distributed arrays, permitting users to create their own array implementations and distributions), it's hard to imagine how we would leverage prior art (i.e., those half-century old solutions don't support user-defined distributed arrays).
Rank-change has not been a priority to date because uses of it have been fairly minimal in user codes so far, and rank-preserving slices have typically sufficed. That said, rank-change has started to see more use recently, leading us to open issues like Optimize rank change slices to support a lazy creation mode, similar to rank-preserving slices · Issue #20749 · chapel-lang/chapel · GitHub. You are of course also welcome to file issues of your own requesting improvements in performance or changes in behavior.
You note:
the ref is never used. And yet it slows down the matrix multiplication of two 800*800 matrices down by a factor of two. That is scary.
Engin addressed this in his response:
The compiler could ideally remove that line as it has no effect, but with the array implementation and the complications of the rank-change operation it is arguably significantly harder for the compiler to prove that that operation doesn't have any side-effects, as such, it hasn't been a priority in our compiler/performance efforts.
So this point was obviously not lost on him. Tagging onto his response, I'll add that we generally haven't expended effort in our compiler optimizing away dead code that a user writes but didn't actually want to execute. There are a number of reasons for this, some smaller, some bigger:
- we've arguably got bigger fish to fry at this point (e.g., I'd argue that optimizing rank-change slices feels more important than eliminating unused ones)
- users can typically delete code that they don't want to execute rather than relying on the compiler to eliminate it for them
- it's been useful for benchmarking purposes
- we can often rely on the back-end compiler to take care of dead-code elimination for scalar operations (unfortunately, the code we generate for an array slice is complicated enough that we can't expect it to take care of such cases for us)
- due to the complications within our compiler that Engin cites
I would have thought that if I create the reference to an entire row prior to some loop, then surely the accesses to elements during that loop using that ref would be heaps faster than the alternative of references to the individual elements within the original 2D array.
In our current implementation, this isn't the case. Rank-change slicing is currently implemented by converting the 1D array access of the slice into a 2D access on the original array. This approach is taken primarily because of the need to support a rank-change slice of any kind of array—a local array, a block-distributed array, a cyclic-distribted array, the array implementation that you might write, etc. This is why Chapel can't generally use a simple dope vector manipulation as in F90. (Could Chapel optimize rank-change slices of local arrays using dope-vector-like techniques? Quite likely, but it doesn't today).
it seems like the performance of using a ref has actually gotten worse since 1.22.0 but that may be a total misconception as I have not tested this (and do not have the time to go back and test).
I'm not certain, but I'd be surprised if that were the case—neither refs nor rank-change slices have undergone significant changes since 1.22 that I can think of. I also don't think it's the ref
that's resulting in the additional performance overhead here, but the rank-change slice itself and the indexing into it.
Tell me if I am pushing for something that really is not top of the priority list.
It definitely isn't at present. Current priorities are listed here:
-Brad