Interesting Optimizations (or Not)

I have a routine which merges two halves of a buffer (which comes from a Merge Sort). I have attached the routine (because with comments it is a bit long).
loop.txt (1.9 KB)
The interesting bit is this:

    for k in lo .. hi by s do
        var tk : T;

        // blah ... blah
        else if y[i] <= t[j] then // copy one item from lower chunk
            tk = y[i]; i += 1;
        else// copy one item from upper chunk
            tk = t[j]; j += s;
        t[k] = tk;

When I change the first 3 lines to

    for (k, tk) in zip(lo .. hi by s, t[lo .. hi by s])

and drop the line

        t[k] = tk

It slowed down by 30+%. Maybe because I reference t[j] during the loop. Wow. I obviously do not use the zipper'ed variant. Is this expected? Definitely not urgent

It transpires that the answer is yes.

When I removed the zippering but forced the same effective slicing to happen on entry to the routine, I saw the same issue.

So it is the overhead due to the slicing operation as a percentage of the relatively simple task being done within the routine in question.

Luckily there is a lot of effort by the Chapel team with issues like #16333 and #24343 that trying to reduce slicing overhead so this problem will disappear in the near future.

Just expanding on this, the very clear statement (which I prefer) like

const y : [1 .. m] T = t[lo .. cut by s]

unfortunately has a lot more overhead than

const y : [1 .. m] T = for i in lo .. cut by s do t[i];

In context, for a relatively simple task which is just a merge, the second style of assignment yields a 12% speed gain. Looks like I have a test case for some of the work being done by the team.