20203, "ronawho", "Should we inline scan/reduce methods?", "2022-07-13T11:26:32Z"
Mark SumReduceScanOp's methods with 'inline' by vasslitvinov · Pull Request #12691 · chapel-lang/chapel · GitHub inlined scan/reduce methods but only for +
. As part of some other scan optimizations I was working on recently I found myself wondering how much of a performance difference this makes and at the very least I wanted to be consistent about inlining for all our operators.
Inlining +
in Mark SumReduceScanOp's methods with 'inline' by vasslitvinov · Pull Request #12691 · chapel-lang/chapel · GitHub resolved a small performance regression for reductions over tuples in NPB EP. For trivial types like numeric
it seems like we should obviously inline because the method calls are trivial. The backend will likely inline these anyways, but inlining in chapel gives us more visibility to optimize in our compiler. However, for complicated data types (say reduce/scan where the result type is an array) it could result in significant code bloat to inline. Tuples are somewhere in-between where the size of the tuple probably impacts whether inlining is a good idea.
I find myself wanting an "inline if small" directive or something but barring that I would probably stop inlining the methods for +
scan/reduce. I did not see any performance difference for the cross product of scan/reduce for large/small int arrays when done in serial or parallel, but we should take some time to explore this for the original tuple case though.