BinaryInsertionSort

Line 779 in Sort.chpl

 for i in low..high by stride {

I could be very wrong, but starting at low rather than low + stride means that the first pass through the loop is a no-op.

The value of found will always be false which means loc will be lo and the inner loop is

for j in low .. low - stride by -stride

which will always be a no-op and the assignment after the loop achieves

Data[low] = Data[low]

which is a no-op.

Please check my ramblings. If I am correct the lower limit in the external loop should be adjusted to low + stride

Hi Damian -

Yes, I believe you are right about that. Would you be up for making a PR for it?

Note that at present we don't do much with this BinaryInsertionSort implementation; I think we have just one correctness test for it, for example. It's not used by the main proc sort.

Thanks.

I do not know how to do a PR. Are there instructions?

Besides, I first wanted to get someone else to verify my suspicion.

Yes we have some instructions over at Contributor Info — Chapel Documentation 1.33 . Some of these will link to GitHub instructions.

OK. No rush there. There is also another change I would recommend. In the loop of Line 705 of Sort.chpl within InsertionSort, do the same thing at the start of the outer loop. A further but more contentious change would be to use the same logic as is used in BinaryInsertionSort to reinsert ithVal.

    // DGM - replaced by line after
    // for i in low..high by stride {
    for i in low+stride..high by stride {
      var ithVal = Data[i];
      // DGM - no longer needed
      // var inserted = false;
      var ithLoc = low;

      for j in low..i-stride by -stride {
        if chpl_compare(ithVal, Data[j], comparator) < 0 {
          Data[j+stride] = Data[j];
        } else {
          // DGM - replaced by line after which only captures an index
          // Data[j+stride] = ithVal;
          // inserted = true;
          ithLoc = j + stride;
          break;
        }
      }
      // DGM - replaced by line after
      // if (!inserted) {
        // Data[low] = ithVal;
      // }
      // DGM - is the same logic as BinaryInsertionSort OK?
      Data[ithLoc] = ithVal;

I use much the same exit-and-reinsert logic for that inner loop in a cut-down Insertion Sort that I have used in my timsort stuff. It is also the same logic as BinaryInsertion Sort. What do you think? Useful? Useless? ......?

While I have tested my own code enough to know the logic is correct, I have not tested the above anywhere near enough. Do you have testcases that were used to test this code in Sort.chpl before I do anything dangerous?

For InsertionSort we have some testing in these tests:

test/distributions/robust/arithmetic/modules/test_module_Sort.chpl
test/library/packages/Sort/correctness/correctness.chpl
test/library/packages/Sort/correctness/sorty.chpl

You should be able to run these with start_test (see also Chapel Testing System — Chapel Documentation 1.33 if you are unfamiliar with it).

We also have this test to explore the performance of sort functions:

test/library/packages/Sort/performance/sort-performance-explorer.chpl 

However it doesn't test InsertionSort directly at the moment (because it would be really slow for large input). If you want to explore how much your change affects other sort algorithms though, you could do that with this test. Note that these are using insertionSortMoveElts (at least on main now; I don't recall what the last release was doing in this regard). That function is used as a base case for quickSort which is used in some cases by proc sort.

Certainly it seems pretty strange / useless for the code to compare Data[low] with itself on the first iteration through the nested loops. Additionally I admit to being a bit puzzled by the Data[j+stride] = Data[j] part. I'm not saying it's incorrect -- just that it's not immediately obvious to me. Anyway it was added this way in 2007 ( Added the first sparse version of CG which builds the · chapel-lang/chapel@69f8a92 · GitHub ). AFAIK, what we should be seeing, I think, is that the inner loop locates the position where ithVal should be stored, assuming that the data up to but not including i is sorted. It has to move elements around to make room as well.

Anyway, the point is, it's not immediately obvious to me if your change makes sense or not. I'd have to run a pencil and paper insertion sort which is more than I want to get in to right now.

I'm a bit tempted to adjust the algorithm to more resemble the no-swap version from Insertion sort - Wikipedia . Might be easier to reason about. But that's also not something I can get into right now.

Improvements in this area are welcome if you can demonstrate a performance improvement with them or if they preserve the performance of the function while making it clearer.

Thanks for the reply. We needed another set of eyes to look at the problem. We were going around in circles a bit. There is a lot of extremely useful information in there.

You mentioned

AFAIK, what we should be seeing, I think, is that the inner
loop locates the position where ithVal should be stored,
assuming that the data up to but not including i is sorted.
It has to move elements around to make room as well.

Correct. And the code does this, specially the line (you identified as)

Data[j + stride] = Data[j]

is sliding the data up the array by a single slot to make room (at the bottom end) for the insertion of the value from the outer loop. So it is correct.

I will look at making the tweaks and somebody can review them. They improve the performance ever so slightly but more importantly from my perspective, make it a lot easier to read. It is far clearer as to what is going on. It won't happen for a while as I am a bit swamped with other stuff.

I need to more fully understand insertionSortMoveElts and all that Shallow Copy stuff. I understand the need. Should we be using ShallowCopy throughout the Sorting routines. Does ShallowCopy have an overhead for basic numeric types, i.e. integers or floating point numbers? Those changes to the loop will also reduce the number of places that a shallow copy is needed so it will be more readable.

Thanks again for your reply.

From my first perusal, the binary search routine from Search.chpl is a plug-in replacement for the same purpose routine internal to the Binary Insertion sort module. There is a subtle difference which means there may be one or two more (or less) calls to the comparator. Either way, it should still yield a correctly sorted array.

Obviously, it needs testing because I could always be wrong.

@damianmoz

Thanks!

Yes, using the binary search from Search would be reasonable as well (and we should be able to just change the binaryInsertionSort to call the one in Search).

These should be completely irrelevant (both for performance and program behavior) for numeric types. The point of them is that we can avoid doing things like copying strings when sorting an array of strings. So it's primarily an optimization for sorting records of some kind.

Some of the move initialization stuff is user facing but the sort routines are using some parts that are not. We have a bunch of issues about future work in this area that I can dig up if you are interested. And the stable versions of these functions are described in MemMove — Chapel Documentation 1.33 . (Probably too much detail, but: the Sort.chpl ones predate these but in theory could switch to using them. The Sort ones also handle bulk transfer but the MemMove ones don't, yet. Also, I'm not so sure how to handle the "no deinit" variables with the MemMove routines, but it's probably possible. One of the main things used in Sort.chpl that is missing from MemMove is related to creating an array without initializing the elements, which needs more work to be stable / user-facing).

Thanks for that insight.