I am unaware of our plans regarding specifically the swap operator. We usually delegate optimizations like vectorization and unrolling to LLVM. Also, in some cases array swap is implemented by swapping the pointers to the underlying memory regions.

Regarding search, what are the shortcomings of Search.binarySearch() for your uses?

Thanks for the details about the swap operator Vass. I will run tests to identify any differences between

t[t1] <=> t[t2]
foreach (i, j) in zip(t1, t2) do t[i] <=> t[j];

I will do an unrolled version of the second of these because the range t1 is known to be a multiple of 2 so is easily unrolled. The ranges never overlap. Because these are slices of a single array, swapping pointers is not possible.

I have a version of TimSort which uses the existing Insertion sort from Sort.chpl. It uses a binary search which is subtly different from Search.binarySearch. So a better question from me would have been to ask why there are different routines within those two modules?

If you need to know why I am playing with TimSort, I can explain separately.

I actually don't know the answer to that & would do some git blame / git history digging in order to find out. But I think the short answer is almost certainly "There is not a deep reason for it and it's that way for historical reasons".

One odd thing is that the Sort module is compiled in by default (since it's used for some built-in things, like myAssociativeDomain.sorted() ) but the Search module is not. But, that should be a moot point now because we have the BinaryInsertionSort submodule of Sort (but, we did not always have that submodule).

This is getting a little far afield from Damian's original topic, but didn't we decide to move things that relied on sorting like this into the Sort module itself? (Maybe we did it for some things, but missed this one? Or decided to move it, but then never got around to it?)

(My take is that sorting is a pretty special and heavyweight op that shouldn't be compiled into every program by default, so I'd make this a tertiary method in Sort.chpl rather than something auto-included).

The Binary Search in Sort.chpl is seemingly optimized for the case of equality of keys when compared to Search.chpl. There is a subtle difference in logic between the two implementations when an identical key is found. The result of the search including knowledge of equality is then used (maybe sub-optimally) by the sorting routine to slot the (identical) key, say Val, into the sorted list, (I think) at the (upper?) edge of a potential sub-sequence of occurrences of Val in the already-sorted part of the array in question. I found myself going back to first principles which is scary stuff. It needs more investigation. While I could be wrong, I think that the calculation of the optimal slot in a block of identical values within a sorted array will be randomly sub-optimal if the sub-sequence is more than 3 elements long. And yes, that scenario is highly unlikely- but still. It sounds like a job for a student to investigate although it might need a lot of supervision as the problem is a bit esoteric.

Brad, I wouldn't worry about the discussion being far afield of the original topic because it is now starting to overlap with the Discourse item about Binary Insertion Sort. On that point, I note that which the binary search of the Search module handles both strided (strode/stridden?) and non-strided (???) arrays, the ShellSort routine seems to only handle arrays of unit stride (which also impacts Radix Sort which relies on it). My 2c. Sorry. I digress.

IMO it's a bit fiddly for the sort functions to have to handle the variety of strides for the array. I think it's more convenient to use reindex to create a view of the array that isn't strided. But, I haven't done any performance investigation of the impact of doing it that way in the context of sorting, so it might be quite a bit slower. At the same time, while I think it's important that sorting a strided array functions correctly, I'm not sure that sorting a strided array has practical value.

My guess is that things like InsertionSort handle strided arrays because a) they were written earlier, possible before we had reindex working and b) they are simpler and so it's easier to handle.

Thanks. I keep forgetting reindex. I like the idea.

What I would still like to track down is the performance peculiarity of my own variant of timsort which is written to handle strides when processing an array. It has no special handling for an un-strided array. Yes it runs so much faster sorting an un-strided array of real(64) compared to sorting the same data stored in an array defined as

x [M..N by K] real(64);

where K is a const which happens to be 1. The performance sorting x without a comparator is

32.2secs var x : [M..#N by K] real(64);
7.9secs var x : [M..#N] real(64);
7.7secs var y : [M..#N by K] real(64); ref x = y.reindex(M..#N);

So weird. Something to look at over the next few weeks in between doing quotations and writing reports. You have sold me on the idea of reindex.

The performance for Sort.sort is

24.5secs var x : [M..#N by K] real(64);
19.5secs var x : [M..#N] real(64);
19.4secs var y : [M..#N by K] real(64); ref x = y.reindex(M..#N);

Interestingly, when we change the stride to 2, but keep the number of elements to be sorted the same, we see for my timsort

31.4secs var x : [M..#(2*N) by 2] real(64);
7.8secs var y : [M..#(2*N) by 2] real(64); ref x = y.reindex(M..#N);

but for Sort.sort

10.4secs var x : [M..#(2*N) by 2] real(64);
19.5secs var y : [M..#(2*N) by 2] real(64); ref x =y.reindex(M..#N);

Really interesting numbers. More investigation.

Either way, the idea of a reindex looks like the way to go.

BTW sometimes having K be an unsigned integer is better than a signed one (in M..#N by K). Probably not in this case, given what you observe with M..#(2*N) by 2. Generally, 2, or any other positive constant or param, have the effect of an unsigned int when used as a range's stride.