Records, consts, and implicit data passing

ghbrown · July 7, 2022, 12:59am

I found myself in need of an argsort, and wrote my own relatively easily using existing features of the Sort module (which I think shows how well designed and flexible it is, so kudos). In the interest of submitting a more polished version of this implementation as a PR (to help #5724) I began to style the interface after the existing sort interface, make more use of the module defined records, etc. but quickly found myself confused by some of the behavior of records.

Here is the code:

use Sort;

proc argsort(Data : [?Dom] ?eltType,
             comparator : ?rec = defaultComparator) {
  var n : int = Data.size;
  var d : domain(1) = {0..<n}; // this array of indices will be sorted
  var idx : [d] int;
  forall j in d {
    idx[j] = j;
  }
  // overwrite key method such that it pulls respective entries
  proc comparator.key(a) {writeln('first overwrite'); return Data[a];};
  //proc comparator.key(a) {writeln('second overwrite'); return Data[a];};
  
  sort(idx,comparator=comparator);
  return idx;
}

var a = [1,1,20,4,-5,2,3,111,12,5,100];
var sort_indices = argsort(a);
writeln(a[sort_indices]);

Fact 1: If you run the above code it should print out first overwrite a few times, then print the sorted values of a (implying the argsort worked properly).
Question 1.a: defaultComparator is defined as a const in the sort module, how was I able to overwrite it? Is it read into a var when entering the function?
Question 1.b: I wrote it this way on a hope and a prayer that it would work, but how does comparator.key()retain access to Data inside the sort routine? I can make a shaky argument that it should work properly inside of argsort since it looks like a global variable relative to the proc, but I don't see a clear path that information propagating into sort. I would have expected to use an optional argument for additional data in other languages.

Fact 2: If you uncomment the second proc in the function body, the .key(a) method that gets used is neither of the two in the function body (and is probably the one provided by defaultComparator, but I'm not positive). The printed result is the array in original order, meaning the indices were either left unsorted, or they were sorted with default key method.
Question 2.a: Why does this happen? I expect to either never be able to redefine a record method, or to be able to redefine it at will, and what happened was neither of these as far as I can tell.
Question 2.b: Through experimenting with records, I find often (like the case below) the compiler catches such redefinition cases and gives an "ambiguous call" error. This case doesn't seem so different from the above, so should this error be thrown for the above code too? If not, why?

record record_def { };
proc record_def.p() {writeln('default');};

const default_instance : record_def; 
default_instance = new record_def(); 

proc run_record_process(instance : ?rec = default_instance) {
  proc instance.p() {writeln('overwrite 1');}
  for i in 0..<5 {
    instance.p();
  }
}

run_record_process();

Happy to provide any more info or tests if needed, thanks!

bradcray · July 7, 2022, 5:55am

Hi Gabriel —

Thanks for the interest, kudos, and questions. I should prefix my comments by saying that I'm not an expert on the Sort module, but am decent on records, consts, and implicit data passing, so will see where I can get you, and maybe we can call in others to help with things I'm shaky on.

I'm going to take your questions out of order, to address what feels to me (and to you, I think) like the elephant in the room. Specifically, I think there is something very wrong with both the first and second overwrites of comparator.key() and that Chapel isn't treating these as it should, for a few reasons:

First, as you note, given that it seems content to call either the first or the second overwrite when only one of the two are available, it really should give an ambiguity error when both are available.
Second, I don't think it's intentional that you should be able to attach a method to a variable (e.g., comparator here) rather than a type (e.g., rec here), and this seems like a bug to me (and one that can be reproduced by the following simple program:
```
record R {
}

var myR: R;

// this is OK, since R is a type                                                
proc R.foo() {
  writeln("In foo()");
}

// this shouldn't be OK; myR is a value                                         
proc myR.bar() {
  writeln("In bar()");
}

myR.foo();
myR.bar();
```
That said, if I redefine your .key() methods using rec rather than comparator, I see the same behavior (things "seem to work" when one is defined, and neither is called if both are).
Third, even if we define just one of these procedures using rec.key(), I'm still surprised that it works for a few reasons:
- the Sort documentation says that a comparator cannot contain both a key() and a keyPart() method; the DefaultComparator record type that's effectively being used here has a keyPart() method, and you're trying to add a key() method, which intuitively seems as though it is breaking this rule.
- there's also some concern here on my part about whether the Sort module ought to be able to call your .key() method due to concerns about hijacking and point-of-instantiation rules. If it is legal as written, my guess is that what happens is that—since sort() is a generic routine—it will effectively get instantiated here in your module (maybe even within your argsort() routine?), and can therefore see procedures like .key() that are seemingly local to this scope. And that maybe since .key() is more local to this context than the .keyPart() defined in Sort, .key() wins (?). I'd need @vass, @mppf, and/or @lydia to check my work on this, though, as I have a hard time keeping the rules straight.

So as you can see, even though you got the code to seemingly work as you'd expected/hoped, it's making me nervous that it does, as written. The best case scenario would be that it is reasonable for sort() to call your .key() for reasons related to my guesses above, that Chapel should be generating an ambiguity error when two such overloads were defined, that we'd similarly generate an error for trying to define methods on variables rather than types, but that you'd define it on rec and all would be well. But I'm not optimistic that that's right.

Continuing to delay getting to your actual questions, I thought I'd try to sketch out how I think this pattern could (almost) be written more legally, and came up with this:

use Sort;

proc argsort(Data : [?Dom] ?eltType) {
  var n : int = Data.size;
  var d : domain(1) = {0..<n}; // this array of indices will be sorted          
  var idx : [d] int;
  forall j in d {
    idx[j] = j;
  }

  record MyComparator {
    proc key(a) {
      writeln('third version'); return Data[a];
    }
  }

  sort(idx,comparator=new MyComparator());
  return idx;
}

var a = [1,1,20,4,-5,2,3,111,12,5,100];
var sort_indices = argsort(a);
writeln(a[sort_indices]);

Specifically, what I'm doing here is defining my own comparator type with its own .key() method defined as you were, and passing that to Sort.sort(). I think this is all on the up-and-up.

The reason I say it "almost" implements the pattern that you were shooting for is that I've dropped the comparator argument because I'm not certain what I would do with it if it one were passed in. Would I want it to override this local comparator I've created and get passed in instead? Would I want its logic to be combined into this local comparator record in some way? I decided these were questions that would be best put off until daytime or kicked back to you for consideration and just removed the comparator argument instead.

And then as one last delay, I wanted to note that Arkouda implements argsort and coargsort functionality (where I've spent very little time with their code, and am just recognizing the names), in case that might be useful to you. There has been discussion about spinning their sort routines out into a mason package to make them usable to other users, and some baby steps in that direction, but it hasn't happened yet.

OK, on to your questions:

Q1a: I would say that you are not overwriting defaultComparator in any way. Specifically, you're accepting it as a default value into your argsort() routine, and effectively having the argument named comparator store a const ref that refers to it. The const-ness of defaultComparator and const ref-ness of comparator means that you wouldn't be allowed to modify its fields, if it had any. But it doesn't, it's effectively just a way of passing a bundle of methods around.

What you're doing here is effectively extending the methods that its type (DefaultComparator) supports (illegally, in my opinion, though switching to rec would make it legal). This is less a question of const-ness as it is what methods a given type supports. Methods like these are called tertiary methods in Chapel because they are defined in a different module/scope than the type itself. Sometimes it makes people uneasy to think that they can define tertiary methods at all, but if you think of them as being like defining a standalone procedure that accepts a rec/DefaultComparator as its first argument, and it just gets some special syntax that lets that first argument be the receiver of the method call, then hopefully it's not as unnerving.

Q1b: In general, for nested procedures like this, any references to outer variables within the procedure itself are passed into the procedure as additional hidden/automatic arguments by the compiler, so the fact that they refer to those outer variables isn't problematic. And since the procedure, by definition, only logically exists while its outer scope does (where I'm ignoring asynchrony for the time being), it makes sense that it can refer to such variables. As an example, if you were to look at the generated code for a case like:

proc foo() {
  var x = 42;
  proc bar() {
    writeln(x);
  }
  bar();
}

what you'd see is that x is getting passed by the compiler-generated code into bar() through the call.

In your case, it's a bit more mysterious, however, since the Sort module is expecting to call a key() method that just takes the one argument. I believe what's going on here is, again, a powerful case of Chapel's point-of-instantiation semantics where we're stamping out the generic sort() routine as though it were conceptually local to this context, and therefore its calls to .key() are getting specialized to pass Data through. And inspecting the generated code seems to suggest this, as I'm seeing the definition of key to take three arguments, where the third is the array Data:

/* testit-orig.chpl:13 */
static int64_t key_chpl(DefaultComparator_chpl * this_chpl10,
                        int64_t a_chpl2,
                        _array_DefaultRectangularArr_1_int64_t_F_int64_t_int64_\

and the calls within the generated sort routine to be passing these additional arguments through:

        call_tmp_chpl44 = key_chpl(comparator_chpl, ret_chpl2, Data_chpl2);

All that said, given my previous misgivings about whether .key() should be winning at all or allowed to co-exist with a .keyPart() method, I'd be hard-pressed to say whether this behavior is a bug or a feature in your specific example. This would be another good place to have @mppf, @vass, and/or @lydia check my work.

And then I think I've effectlvely already answered Q2a and Q2b by saying "you're right, that seems really weird and inconsistent. To summarize, I definitely think the ability to call one but not to get an ambiguity error when defining both is a bug, though I'm less certain whether the bug is simply the lack of an ambiguity error or the fact that it's letting you call the method when one is defined to begin with. And I definitely think it's an unintentional bug that we let you define methods on specific variables rather than their types.

OK, let me stop there, and see if I've helped clarify some of your questions, where we may need to rely on others to help with some of the areas where I feel non-confident.

I'll also be curious whether my attempt to rewrite your pattern feels like a step in the right direction and more comprehensible / reasonable, and what your thoughts are as to what should happen if a comparator argument was to be re-added. One possibility might be to create two overloads of this routine, one that takes a comparator and passes it through, the second of which doesn't take one and passes through a local one like the one I created (?).

-Brad

mppf · July 7, 2022, 2:53pm

Maybe this is evidence it is ready for the rite of passage to become a standard module!

Our POI rules will prefer something in the Sort module if it applies for a generic function within the Sort module. In this case I think the way the sort module uses canResolve is probably messing that up (it is probably looking first for key and that allows the one from POI to be used because the default comparator does not have a key).

I'm sure that the Sort code could improve its use of canResolve to insist on just one of compare key or keyPart methods.

ghbrown · July 7, 2022, 3:31pm

Thanks as always for the detailed and educational response, I learned a lot.

(As somebody who seldom uses OOP) I would have thought that the const-ness meant that neither fields nor methods could be modified, added, subtracted, etc.

It doesn't make me uneasy (though perhaps it should?), but I am curious what happens across scopes and modules when they get extended like this. My intuition is that it's useful (like my example) and harder to reason about about/more bug prone.

Wow, that's neat.
I'm finding it harder to reason about than the typical pattern of

// module code
proc sort(arr,comparator=defaultComparator,extras = ()) { //actual sort definition };

// code of user writing their own simple argsort
record rec_def {
  proc key(a,extras) {return extras[0][a];};
}
sort(idx,comparator= new rec_def,extras=data);
// idx now arg-sorted

from Python (for example) since I'm not used to such a cunning compiler.

Certainly a step in the right direction in the sense that I better understand it and it uses "developer approved" techniques, rather than riding on the edge of disaster like I was.
For my end goal, I do think removing the input comparator is a loss, since I planned to propose an argsort method to the Sort module. My thinking was that if I could do it by wrapping the existing sort you should get predictable and documented behavior and interface, take advantage of the performance of sort while not duplicating code, distributed sort for free(?), and I wouldn't have to think too hard. My plan was to have argsort exactly mirror the interface of sort, then pick apart, wrap, and reassemble the comparator for the sort call by any means necessary (probably pulling from DefaultComparator where possible).
For example, here's a Chapel/pseudocode version of how I thought I might deal with a user defined key

proc argsort(Data : [?Dom] ?eltType,
             in_comparator : ?rec = defaultComparator) {
  var n : int = Data.size;
  var d : domain(1) = {0..<n}; // this array of indices will be sorted
  var idx : [d] int;

  record sort_comparator_def {
    proc key(a) {
      return in_comparator.key(Data[i]);
    }
  }
sort(idx,comparator= new sort_comparator_def());
}

with the idea being that the user can use the key, compare, and keyPart knowledge and style from sort to operate argsort. Besides wrapping compare, which I don't know how to do, I think the rest of it can be handled with just some logic on in_comparator or its record and Sort's DefaultComparator.

bradcray · July 8, 2022, 1:28am

Hi Gabriel —

First, Michael's mention of canResolve() and Sort's use of it made me realize what is probably happening with your doubly-defined .key() methods: I'm guessing that we make the query "Can we resolve .key()?" and the result comes back "No, we can't [because it's ambiguous]" and from that point it falls back to the .keyPart()-based method and ceases to work.

In a very dynamic interpretation of OOP, I can imagine that interpretation for sure. Since Chapel is a statically typed language, we don't really think of code/method/procedures as being part of the dynamic value of an object that is or is not const, but rather part of its static/type properties. I guess I would also say that extending a type by adding new capabilities to it isn't modifying the existing capabilities, but it is clearly adding capabilities as you say. We allow that, though.

I am curious what happens across scopes and modules when they get extended like this. My intuition is that it's useful (like my example) and harder to reason about about/more bug prone.

We've discussed these concerns in recent years as well, and ended up with the current design, in which you typically can't invoke a tertiary method (again, one defined outside of the module where the type is defined) without a use or import of that module. Although as cases like yours show, when calling a generic procedure, which needs to be able to see the things that it's being instantiated with, it has visibility into the scope of that call, which gives it access to tertiary methods as well.

Returning to the simpler, non-generic case: if I add a tertiary method in my module, it can't affect your code without you opting to have it do so. For example, if you use my module, you'll see it. use is, by design, a "fast and loose" concept in that by default it makes all of a module's contents available to you, so generally isn't the preferred option if you're trying to make safe code. Specifically, I could add new stuff to my module (like new tertiary methods) and surprise you. So use is designed for sketching quickly, or for cases where you're completely OK with getting whatever a module offers. (use also supports only clauses, which let you be more particular about what you get).

In contrast, if you import a module, it's much more locked down—you'd have to explicitly import the type my tertiary method was defined on in order for your code to see it. So it's much safer and less prone to surprises.

For example, given:

module M {  // I define a type R
  record R {
  }
}

module N {  // I extend R to include foo()
  use M;
  var x: int;
  proc R.foo() { writeln("In foo"); }
}

If you had a module that contained use N;, you'd get access to R.foo() whether you intended to or not. If you had one that said use N only x; you wouldn't. If you had one that did import N; or import N.x; you wouldn't be able to call R.foo();. If you did import N.R; you would be able to. So relying on import should generally avoid surprises.

Returning briefly to the generic case, I think we consider it to be safe because it only has access to things that the callsite does, so won't pick up things that you haven't defined, or made available through use/import, within view of the callsite.

I think there is definitely more that could be done to improve the ergonomics of our Sort module, such as passing comparators as first-class functions or lambdas rather than via records. Those features aren't really up-to-snuff yet, which is why we rely on the record-based approach currently, despite it involving more code.

With the record-based approach, I think the challenge to writing it more plainly as you have is that we don't currently have the ability for records to store references to arrays like Data in your original example (which I think is an obvious, and increasingly annoying lack in the language that I expect we'll address in one way or another, hopefully soon). So what that means is that Data would need to be copied into the record in order for its key method to refer to it (which seems expensive/overkill). Or that sort() would somehow need to pass additional arguments like Data to .key() (which is hard to imagine). Or that you could use a scoping trick like you did. FWIW, despite being pretty good at Chapel, I would not have bet money that your approach would have worked as well as it did. And I think it'd be nicer if we had alternatives to relying on that kind of approach (where the ref field within the record or a lambda/first-class function would be my preferred approaches).

My plan was to have argsort exactly mirror the interface of sort, then pick apart, wrap, and reassemble the comparator for the sort call by any means necessary (probably pulling from DefaultComparator where possible).

This is what I was imagining last night as well—would there be some way to either graft the passed in comparator into our own local record, or else switch between a few implementations based on whether a comparator was passed in or not? In both cases, I'm thinking about this too late in the day to have time to experiment with it myself, though (and I think the motivations you list are worthy ones).

I'm inclined to tag @mppf on this as our in-house sort expert to get his thoughts on how to write an argsort that piggybacks on sort (in part because I leave for vacation tomorrow, so won't be able to get back to this for awhile). And I also wonder what the Arkouda argsort ended up doing, though I suspect that they didn't worry about accepting general purpose comparators, knowing that they were just going to be dealing with a certain set of known types (?).

Best wishes,
-Brad

mppf · July 8, 2022, 1:23pm

We do this in the Sort module already with ReverseComparator (note that as a style guide matter we are moving towards record types starting with a lower case though).

I would imagine that you could follow the pattern of that ReverseComparator in order to generate a comparator that can be used for argsort..

github.com

chapel-lang/chapel/blob/a9ff0f1028e9edc7515ffc947fe68a162386201a/modules/packages/Sort.chpl#L3665-L3669


      
          /* Reverse comparator built from another comparator.*/
          record ReverseComparator {
          
            /* Generic comparator defined in initializer.*/
            var comparator;

ghbrown · July 8, 2022, 3:07pm

I'm pretty convinced by this explanation, but there are still compiler errors right? If so, I'm happy to open the relevant issues so they aren't forgotten (@mppf, would you mind commenting on this, since Brad's away for a bit).
My understanding is that there are two separate issues:

you can add procs to variable instances of records
multi-defining tertiary methods doesn't give an ambiguous call error at compile time

This was my exact line of thought. The pattern of including static data alongside other parameters is pretty common in all kinds of numerical methods (but not so much sort, in my experience). I'm very happy it worked, but I would have never known to look for this technique. I'm certainly in no position to say it's a better or worse since I'm more familiar with other langauges, but even if I started life as a Chapel programmer I think I'd find it easier to reason about less implicit ways of passing the data.

The use and import differences for tertiary methods make a lot of sense, thanks for explaining.

I did not mean to neglect your mention of Arkouda in my previous response. It would be nice to have different sort options with possibly a different interface/style for diversity. If such package existed I may well have used it to solve my problem (argsorting some strings), which it probably could have handled easily. However, given that an argsort implementation built on the sort module seems feasible, I'm inclined to see how far it can go.

@mppf Thanks, I hadn't thought of implementing it as another comparator. I'll definitely give that a shot.

mppf · July 8, 2022, 8:17pm

Well the 1st should definitely be a compilation error. I think the 2nd is more of an open question about how canResolve should behave.

ghbrown · July 9, 2022, 1:11am

I pondered the use of an argsort comparator more, and even if I can get it to work I'm curious what people think of the interface. Given an argsort comparator record, in order for the user to access the indices they'd probably have to call it like

sort(indexArrayByUser,comparator=argsortComparator(comparatorOfChoice,Data));

and I'm not sure if the implicit passing would still work so nicely. I'm almost positive we won't be able to get a call like

sort(Data,comparator=argsortComparator(comparatorOfChoice));

because all the sort does is rearrange Data, while the user fundamentally wants indices.
An actual argsort function would look like

var indices = argsort(Data,comparator=comparatorOfChoice);

and it was pretty easy to whip this together

use Sort;

proc argsort(Data : [?Dom] ?eltType,
             comparator : ?rec = defaultComparator) {
  var n : int = Data.size;
  var d : domain(1) = {0..<n}; 
  var idx : [d] int; // array of indices to be sorted
  forall j in d { // there has to be a better way to do this
    idx[j] = j;
  }
  record argsortComparatorDef {
    proc hasKey(a) param {
      // checks if comparator has key method
      return canResolveMethod(comparator,"key",a);
    }
    inline proc key(a) where hasKey(a) {
      // only defined if input comparator has key
      return comparator.key(Data[a]);
    }
    proc hasCompare(a,b) param {
      // checks if comparator has compare method
      return canResolveMethod(comparator,"compare",a,b);
    }
    inline proc compare(a,b) where hasCompare(a,b) {
      // only defined if input comparator has compare
      return comparator.compare(Data[a],Data[b]);
    }
    proc hasKeyPart(a) param {
      // checks if comparator has keyPart method
      return canResolveMethod(comparator,"keyPart",a,0);
    }
    inline proc keyPart(a,i) where hasKeyPart(a) {
      // only defined if input comparator has keyPart
      return comparator.keyPart(Data[a],i);
    }
  }
  var argsortComparator : argsortComparatorDef;
  sort(idx,comparator=argsortComparator);
  return idx;
}


record absComparatorDef {
  proc key(a) {return abs(a);};
}
var absComparator : absComparatorDef;
var b = [1,1,20,4,-5,2,3,111,12,5,100];
var sortIndices = argsort(b);
var absSortIndices = argsort(b,comparator=absComparator);
writeln('b : ', b);
writeln('b[sortIndices] : ', b[sortIndices]);
writeln('b[absSortIndices] : ', b[absSortIndices]);

I followed the general idea of ReverseComparator, but either I missed some of the complication or the argsort case is just simpler. This argsort should have the same interface as sort, and can wrap any viable key, compare (not sure why I though this was impossible), or keyPart method.

What was the reasoning for handling reverse sort by comparator rather than optional argument, etc.? Does that argument still hold for the argsort case?

mppf · July 11, 2022, 12:26pm

var idx: [d] int = d;

Your argsortComparatorDef looks reasonable to me. I tend to avoid having records defined inside of functions because I'm often running into bugs in the compiler with it. The "obvious" alternative would be to define the record outside of proc argsort but that would require having ref fields which we don't yet ( Timeline for ref Fields in Classes/Records? · Issue #8481 · chapel-lang/chapel · GitHub ). But anyway there is an alternative if you run into problems with the inner record. You can define the record outside of proc argsort which would just be record argsortComparatorDef and then declare the methods inside like proc argsortComparatorDef.hasKey(a) param { ...}. Anyway that's just something to be aware of in case you do run into some kind of compiler bug as you do more with this code.

It just makes the implementation of the sort algorithms simpler since they just have to worry about the comparator and not additional information. IMO it is also nice for all of the sort ordering to be encapsulated in the comparator.

ghbrown · July 11, 2022, 3:00pm

Thanks for the tip on the array initialization and forewarning about compiler issues.

I think that makes sense, but why not stash the ReverseComparator() call in sort() contingent on some reverse optional variable? In fact, you could still add this option and all current behavior and examples in the docs would still hold/work. I suppose this is technically "additional information" for sort to handle, but the alternative is users having to handle it on their side. Perhaps performance reasons or something else I'm not thinking about?

mppf · July 11, 2022, 3:12pm

You mean to have e.g.

proc sort(Data: [?Dom] ?eltType, comparator:?rec=defaultComparator, reverse: bool = false)

?

Well there are a few problems.

First, we need to get the comparator figured out at compile time (for performance reasons and just because it's a typed language). We could have something like if reverse { sort with reverse comparator } else { sort with passed comparator } but that creates extra code. Or we could make it be param reverse: bool = false.

Second, like I said, I like the ability to use the comparators to specify the entirety of the sort order. That way, other related code (e.g. a binary search) does not have to take in a reverse argument as well - everything is neatly bundled up in the comparator.

But anyway, we can certainly consider adding an optional reverse argument to the sort call, and if you still think that is worthwhile I recommend taking it to an issue.

ghbrown · July 11, 2022, 3:53pm

Yes, that's what I meant.

I certainly don't think it's worth it to pursue any option requiring duplication of sort source code.

proc sort(Data: [?Dom] ?eltType, comparator:?rec=defaultComparator, reverse: bool = false) {
  var inner_comparator;
  if reverse {
  inner_comparator = ReverseComparator(in_comparator);
  } else {
  inner_comparator = comparator;
  }
  <rest of sort routine>
}

which perhaps can be done with parameter? It may well be more trouble than it's worth and your binary search example is quite compelling, but if reverse and argsort options could be added to the sort interface without significant complication it could be nice too.

Perhaps one could have other interfaces that include these variables and perform some comparator manipulation before call the existing version of sort which is left unchanged? If it's possible to unify sort, reverse sort, and argsort in such a way I'm happy to help (though maybe the output type of any sort function depending on param argsort could make this impossible or annoying?). And if not, I'm happy to prepare argsort as a separate function with the discussed interface if that would be of value.

bradcray · July 19, 2022, 7:28pm

@ghbrown: I'm just dropping by to say that I'm back from vacation now, appreciate @mppf picking up the conversation about sort design rationale and futures. It seems you have some dangling/implied questions there about potential futures which I don't feel expert enough in the Sort module to be able to comment on other than to say that I find the notion of an optional reverse boolean argument appealing from a user standpoint, from the perspective of making a common case simple. I'm ignoring any potential implementation/maintenance/performance costs in saying that, though. I agree with Michael's suggestion to kick off a "feature request" GitHub issue for that if you'd like to champion it (and that could be a more permanent place to discuss implications further as well).

Rewinding further, I'd like to be sure we capture a GitHub issue for the compiler permitting tertiary methods to be defined on variables rather than types. If you don't have the time/interest in filing that, let me know and I'll add it to my queue.

Like Michael, it's not immediately obvious to me what should happen with the ambiguity between the two tertiary methods. I'm fairly certain that if you were to simply insert a call to .key(), you'd see the ambiguity, and that the issue is that the Sort code doesn't simply call it, but asks whether it can call it where the ambiguity results in a "no" answer, causing it to fall back to its other approach. This relates to a common discussion topic with our reflection routines which is what they should do about errors. For example, imagine the following code sketch:

proc foo() {
  compilerError("Calling foo() is erroneous");
}

writeln(canResolve("foo"));

Should the behavior here be:

prints true: You can call foo() since it's a routine that exists, it just isn't very useful to do so
prints false: You can't call foo() because any call that does so will generate an error, so it's essentially not callable/useful
prints the compiler error: Even asking the question about whether you can call foo() will generate the compiler error even though you haven't actually called it

Your case seems similar in that it's correctly saying "I can't call that", but maybe not giving back as much information as desired. Maybe the approach would be to have the sort module do something before falling back like seeing whether there are any methods named '.key' or seeing how many candidates there are for the call and issuing an error if there are more than one. Or maybe canResolve() needs to be redesigned in some way. Anyway, I would be fine with an issue being opened for this to capture the surprise and confusion, as that's completely reasonable. I'm just not sure what the best path for resolving it would be offhand.

Hope you've been well,
-Brad

ghbrown · July 19, 2022, 7:32pm

I found an issue with my argsort implementation where it would fail to compile when trying to sort non-int(64) data types, complaining about not having a comparator with methods that apply to int(64).
I was following the ReverseComparator style of comparator construction, but it since the reverse case constructs a comparator to work on an array of the same type it won't work for argsort where the sorted array is of type int(64) and the user array can be anything.

I've fixed it by playing some trick with the hasX method that diverge from the style of ReverseComparator. I did try a few other approaches that I would consider "cleaner", but they didn't work out.

Here's the updated code:

module Argsort{
  use Sort;
  import Reflection.canResolveMethod;

  proc argsort(Data : [?Dom] ?eltType,
               comparator : ?rec = defaultComparator) {
    var n : int = Data.size;
    var d : domain(1) = {0..<n}; 
    var idx: [d] int = d; // array of indices to be sorted

    record argsortComparatorDef {
      /* 
         hasX() methods must be defined with no arguments and use
         eltType internally since user comparator is written for
         arrays of arbitrary type, while argsortComparator operates on
         strictly integer arrays; this differs from the style of
         ReverseComparator which uses the same arguments for hasX()
         and the comparison function X()
      */
      proc hasKey() param {
        // checks if comparator has key method
        var a : eltType;
        return canResolveMethod(comparator,"key",a);
      }
      inline proc key(a) where hasKey() {
        // only defined if input comparator has key
        return comparator.key(Data[a]);
      }
      proc hasCompare() param {
        // checks if comparator has compare method
        var a,b : eltType;
        return canResolveMethod(comparator,"compare",a,b);
      }
      inline proc compare(a,b) where hasCompare() {
        // only defined if input comparator has compare
        return comparator.compare(Data[a],Data[b]);
      }
      proc hasKeyPart() param {
        // checks if comparator has keyPart method
        var a : eltType;
        return canResolveMethod(comparator,"keyPart",a,0);
      }
      inline proc keyPart(a,i) where hasKeyPart() {
        // only defined if input comparator has keyPart
        return comparator.keyPart(Data[a],i);
      }
    }
    var argsortComparator : argsortComparatorDef;
    sort(idx,comparator=argsortComparator);
    return idx;
  }
}

I think the main improvements are now on the domain of the sorted index array. It seems a bit hard-coded now, and it might be nice to tie it to the domain of the user's Data array somehow.

ghbrown · July 19, 2022, 8:21pm

@bradcray, glad to hear you are back safely! I was preparing the previous message before I saw yours, so let me respond to that here.

Having a friendlier interface for sort would be nice for users so they didn't have to wrangle reversing comparators on "their side" of the sort function. I think actual action on this point is low priority, but perhaps collecting opinions somewhere would be a useful first step to see where people stand.
If possible, I'd like to get argsort squared away first, then discuss the sort, reverse sort, argsort interface. @mppf, would you like me to prepare an issue for my implementation of argsort to discuss features, integration, etc., with a PR to come soon after? There is already #5724, but it touches on quite a few areas in addition to argsort.

This just slipped off my to do list, I'll submit an issue shortly.

Now that I understand the code and canResolveMethod the behavior I saw makes perfect sense, but before that it was an opaque mystery. Perhaps the right course of action depends on how frequent it is for users to be editing or adding methods like this to code that uses canResolveMethod. If it's not a frequent pattern, perhaps safeguarding the sort module is better, while if it's common then perhaps it's better to address the issue at the canResolveMethod level. I don't have any insight on this, but I'm happy to open an issue if needed. Would it be more appropriate for sort or more generally about canResolveMethod?

bradcray · July 19, 2022, 9:26pm

I don't have any recollection of users hitting this either in the specific context of Sort nor the more general context of canResolveMethod, but since we're still a relatively small project, I don't know that I'd draw any strong conclusions from that.

"Needed" is probably overstating it (e.g., I'm not volunteering to file this one for you if you don't ), but again, I wouldn't object to having you file an issue to capture the confusion and think it could have value by serving as a place for us to track other people hitting similar issues in the future. I'd probably take the approach of capturing your observed behavior and the confusion around it, then explaining why the behavior was occurring. In terms of titling it, I'd probably try to span both aspects, so maybe something like "Confusing sort behavior when comparator defines two .key methods due to canResolveMethod behavior." And I'd try to mention Reflection somewhere in the issue to improve the chances of finding it with a search. If you choose not to proceed with this, that's completely fine as well. I don't have any illusions that filing issues is effort-free.

-Brad

ghbrown · July 19, 2022, 9:43pm

I'm happy to file it, thanks for the content and title suggestions.

Topic		Replies	Views
Sortperm Users	15	48	February 20, 2026
Performance Considerations Users	19	411	April 21, 2021
Design input for model state Users	25	803	May 20, 2021
Creating a New Range Method Users	6	360	March 23, 2021
Sort Module Users	4	265	February 25, 2021

Records, consts, and implicit data passing

Related topics