Design input for model state

This post is just to move the discussion to Discourse. I'll try out the suggestions in that issue and report back.

Thanks, I think we are making progress with your suggestions.

The trick we missed was that types can be defined by the return value of functions. As long as things get resolved within init(), this will work. great!

A couple of questions:

  1. I can define a map as map(string, shared AbstractDA) but not as map(string, AbstractDA). Why is this? the latter complains cannot default-initialize a variable with generic type. The former is what is done in Arkouda as well.

  2. I tried to write a function that returns a domain like in Arkouda

 proc get_domain(x: int = 1, y: int = 1, z: int = 1) {

            return {0..#x, 0..#y, 0..#z};
     }

However, if I use any sort of conditional statements in this function, e.g,

 proc get_domain(x: int = 1, y: int = 1, z: int = 1) {

           if z == 0 then
                 return {0..#x, 0..#y}
            return {0..#x, 0..#y, 0..#z};
     }

the compiler complains that it cannot resolve the return type. This does not seem to happen in Arkouda where the exact same thing is done: arkouda/SymArrayDmap.chpl at d6ae98690c9211161ed76b7a5a6f8363b1de295a · mhmerrill/arkouda · GitHub

Any suggestions on what I might be doing incorrectly?

TIA,
Joy

You cannot have a single function which returns domains of different dimensionality. The compiler must know the return type at compile-time. Your variable z is doing the comparison against zero at run-time. Arkouda returns domains of the same dimensionality.

The following works

proc getDomain(x : int = 1, y : int = 1, param z : int = 1) where z == 0
{
    return {0 .. #x, 0 .. #y };
}

proc getDomain(x : int = 1, y : int = 1, param z : int = 1) where z != 0
{
    return {0 .. #x, 0 .. #y, 0 .. #z };
}

proc getDomain(x : int = 1, y : int = 1, const z : int = 1)
{
    return {0 .. #x, 0 .. #y, 0 .. #z };
}

var iz = 40;

writeln(getDomain(10, 20, 0));
writeln(getDomain(10, 20, 30));
writeln(getDomain(10, 20, iz));

iz = 0;
writeln(getDomain(10, 20, iz));

Because a param is a compile time constant, if the compiler can detect that zero is explicitly given as the third argument to getDomain, then, and only then will the compiler know to use the 1st version of the getDomain routine and return a 2-dimension domain. If you are using a run-time constant, then the compiler will always decide to use the 3rd version.

Note that in the last writeln, the result is a 3D-domain

{0..9, 0..19, 0..-1}

which may not be what you want. Personally I would write that 3rd and final version of that routine as

proc getDomain(x : int = 1, y : int = 1, const z : int = 1)
{
    return {0 .. #x, 0 .. #y, if z == 0 then 0 .. 0 else 0 .. #z };
}

because at least that always returns a legal domain. But again, that might not be what you want.

In Chapel at least, a const intent is a run-time concept while param intent is a compile-time concept and type resolution is something that has to be done at compile time.

1 Like

This is very helpful! Thanks @damianmoz. I'll see how to this can be used in our design, but it seems we will be able to meet our design requirements regarding creating variable dimension arrays.

@damianmoz : Thank you for picking up the question about the varying return types.

@JMM: FWIW, we've had users who have had some success writing rank-independent codes in Chapel, the most sophisticated of which was a rank-independent AMR code written by a summer intern from UW Applied Math several years ago (who used to quip that he'd debug the 1D version, use 3D versions for his science, and then run 23D versions for fun). I'm almost certain that there's more we could do to improve the ergonomics of rank-independent programming in Chapel, but he at least proved it was tractable. Let us know if there are things that would make your life easier or your code cleaner and we can talk about which are possible vs. not.

That inetern presented his work in a short end-of-summer internal talk as well as a SIAM Minisymposium (search Chapel: Archived Presentations for "SIAM CSE 2011"). Chapel has evolved a lot since that summer, but the core principles should still apply (and we've at least kept his code limping along since then, though one might write aspects of it differently today, starting from scratch).

Turning to your other question:

Given a class C in Chapel, there are currently four management types that can be applied to that class:

  • shared C: think of this as a reference-counted class; simple, but potentially high-overhead if it is copied around a lot
  • owned C: indicates that only one variable owns the class at a time; reasonably lightweight, but requires being attentive to ownership (easy for some programs; harder for others)
  • borrowed C: usually a reference to an owned C or shared C where you don't want to impact its ownership; also lightweight, but depends on other class types.
  • unmanaged C: most similar to a C++ class pointer—requires an explicit delete to free or else it will be leaked; lightweight, but potentially error-prone.

Anyway, the class type C says "I'm some form of C" yet without specifying which one, so that's what the mention of "generic" is in your error message. The representations of these types differs at execution time which is why the map type must decide upon a flavor before being declared (so the compiler can determine how to represent it in memory).

For convenience, new C() is equivalent to new owned C(), though this could also be a source of confusion, by implying that C is a complete type specification in and of itself.

In Arkouda, interactions with the class pointers occur rarely enough that I think shared made sense for them from a simplicity perspective; but it would probably not be too hard to convert their code over to using owned or unmanaged instead if one wanted, and we've had other users rely on those options instead.

-Brad

The other question was way beyond my capabilities so I figured I would tackle the easy one.

I should get hold of that Adaptive Mesh Refinement (AMR) code stuff. I went looking but it seems to be missing stuff. Although maybe I am not looking correctly.

I don't think the AMR code is missing stuff, in the sense that (AFAIK) it's working as intended. The files are distributed across a large file hierarchy which can make it a bit challenging to browse. It could be missing stuff from the "doesn't do everything one might want a full-blown AMR framework to do" perspective, but I think it should exercise the basic pattern.

-Brad

About the above issue.
We used a similar approach as the one used in the Arkouda Project by creating an abstract AbstractDataArray class to handle the generic nature of the DataArray class, but we seem to encounter this problem with ranks.
The model state takes quantities(strings as keys and DataArray as value) that can have different ranks.
The current design of the AbstractDataArray class is something like this

class AbstractDataArray {
        param rank: int;
        param stridable: bool;
        proc toDataArray(type eltType) {
            return try! this :borrowed DataArray(eltType, rank = rank, stridable = stridable);
        } 
 }

The caveat in this is that we are restricted to have the same rank for all DataArrays as the DataArray inherits this class.
How can we accommodate DataArrays of different ranks in the same Map variable?

Thank you,
Abhishek

Hi @abhishekingit

Given:

class C {
  param rank: int;
}

each type C(1), C(2), C(3), etc. is a different type because distinct instantiations of param and type fields result in distinct types in Chapel. So if you needed a map to classes of distinct rank, your choices would be to either make that field non-param:

class C {
  const rank: int;
}

(which would let you reason about the rank in the abstract base class, but not to use it in situations that require a param value, like declaring an array's dimensionality), or you could push it into the less-abstract sub-class):

class C {
  proc getRank(): int {
    halt("Cannot call getRank() on an abstract C class");
    return 0;
  }
}

class D {
  param rank: int;
  override proc getRank(): int {
    return rank;
  }
}

And then to get from the abstract parent class to the concrete child class, you'd have to cast, as with other static aspects of the class (like the array's element type).

-Brad

1 Like

Just a clarification:

you'd have to cast, as weith other static aspects...

This was unclear (not sure what this sentence meant).

Thanks,
Joy

@JMM: Sorry to be vague / unclear.

Let me pop up a level by sketching out a simple, standalone example of the pattern I was describing in my first response, which is similar to what Arkouda does in principle—though the specifics as to what is done where, and how the code is organized, definitely varies. Here, I've tried to focus on clarity (a decent level anyway) and simple patterns / concepts:

use Map;

// This enum is designed to distinguish between 'int', 'real', and 'bool' types
enum myTypes { i, r, b };

// This helper function converts from a type to one of the above enum values
proc typeToEnum(type t) {
  select t {
    when int do
      return myTypes.i;
    when real do
      return myTypes.r;
    when bool do
      return myTypes.b;
    otherwise
      compilerError("Unexpected type in typeToEnum()");
  }
}

// This is an abstract class that isn't good for much other than
// supporting the ability to mix various sub-class instantiations
// within a data structure.
//
class Abstract {
  // This stores information sufficient for determining a concrete
  // subclass's static type
  var concType: myTypes;

  // This is its initializer
  proc init(type t) {
    this.concType = typeToEnum(t);
  }

  // This is a dynamically dispatched method
  proc printMe() {
    writeln("I am a generic Abstract class");
  }
}

// This is a concrete class that contains everything important through
// generic fields
//
class Concrete: Abstract {
  type t;    // the type of elements stored
  var x: t;  // an example element of that type

  // The class's initializer
  proc init(x) {
    super.init(x.type);
    this.t = x.type;
    this.x = x;
  }

  // The class's override of the dynamically dispatched method
  override proc printMe() {
    writeln("I am a Concrete class with type ", t:string, " and value ", x);
  }

  // This is a non-dynamically dispatched method, because its return
  // type will vary across classes in the hierarchy, so it can't be
  // inherited.
  //
  proc getValue() {
    return x;
  }
}


// Declare a map of abstract classes
var m: map(string, shared Abstract);

// Insert a few concrete classes into the map
m.add("A", new shared Concrete(1));
m.add("B", new shared Concrete(2.3));

// Use the dynamically dispatched method
m.getValue("A").printMe();
m.getValue("B").printMe();

// look up a class from 'm' and capture it.
const a = m.getValue("A");

// print its static type ('Abstract')
writeln("a's type is: ", a.type:string);

// let's do something that requires changing to its static type:
foo(a);

// now let's do the same for "B":
foo(m.getValue("B"));

// This version of foo() takes an abstract class and converts it to
// a concrete instance:
//
proc foo(a: Abstract) {
  // Let's cast it to the appropriate static sub-type and do something with it:
  select a.concType {
    when myTypes.i {
      const c = a:Concrete(int);
      foo(c);
    }
    when myTypes.r {
      const c = a:Concrete(real);
      foo(c);
    }
    when myTypes.b {
      const c = a:Concrete(bool);
      foo(c);
    }
    otherwise {
      halt("Unexpected type");
    }
  }
}

// This version of foo() takes a concrete class and does something
// taking advantage of its concrete-ness
//
proc foo(c: Concrete(?)) {
  writeln("In foo, c's type is: ", c.type:string);
  var val = c.getValue();
  writeln("In foo, c's value is: ", val, " : ", val.type:string);
}

So when I said "And then to get from the abstract parent class to the concrete child class, you'd have to cast, as with other static aspects of the class (like the array's element type).", I was imagining extending the pattern above by adding a param field to the child / Concrete class, where if I continued with the current pattern, I might store its execution time value in the parent class so that I'd know what to cast it to:

use Map;

// This enum is designed to distinguish between 'int', 'real', and 'bool' types
enum myTypes { i, r, b };

// This helper function converts from a type to one of the above enum values
proc typeToEnum(type t) {
  select t {
    when int do
      return myTypes.i;
    when real do
      return myTypes.r;
    when bool do
      return myTypes.b;
    otherwise
      compilerError("Unexpected type in typeToEnum()");
  }
}

// This is an abstract class that isn't good for much other than
// supporting the ability to mix various sub-class instantiations
// within a data structure.
//
class Abstract {
  // This stores information sufficient for determining a concrete
  // subclass's static type
  var concType: myTypes;
  const rtRank: int;

  // This is its initializer
  proc init(type t, r: int) {
    this.concType = typeToEnum(t);
    this.rtRank = r;
  }

  // This is a dynamically dispatched method
  proc printMe() {
    writeln("I am a generic Abstract class");
  }
}

// This is a concrete class that contains everything important through
// generic fields
//
class Concrete: Abstract {
  type t;    // the type of elements stored
  param r: int; 
  var x: t;  // an example element of that type

  // The class's initializer
  proc init(param r, x) {
    super.init(x.type, r);
    this.t = x.type;
    this.r = r;
    this.x = x;
  }

  // The class's override of the dynamically dispatched method
  override proc printMe() {
    writeln("I am a Concrete class with type ", t:string, ", value ", x,
            ", and param ", r);
  }

  // This is a non-dynamically dispatched method, because its return
  // type will vary across classes in the hierarchy, so it can't be
  // inherited.
  //
  proc getValue() {
    return x;
  }
}


// Declare a map of abstract classes
var m: map(string, shared Abstract);

// Insert a few concrete classes into the map
m.add("A", new shared Concrete(2, 1));
m.add("B", new shared Concrete(3, 2.3));

// Use the dynamically dispatched method
m.getValue("A").printMe();
m.getValue("B").printMe();

// look up a class from 'm' and capture it.
const a = m.getValue("A");

// print its static type ('Abstract')
writeln("a's type is: ", a.type:string);

// let's do something that requires changing to its static type:
foo(a);

// now let's do the same for "B":
foo(m.getValue("B"));

// This version of foo() takes an abstract class and converts it to
// a concrete instance:
//
proc foo(a: Abstract) {
  // Let's cast it to the appropriate static sub-type and do something with it:
  select a.concType {
    when myTypes.i {
      select a.rtRank {
        when 1 {
          const c = a:Concrete(int, 1);
          foo(c);
        }
        when 2 {
          const c = a:Concrete(int, 2);
          foo(c);
        }
        when 3 {
          const c = a:Concrete(int, 3);
          foo(c);
        }
        otherwise {
          halt("unexpected rank in foo");
        }
      }
    }
    when myTypes.r {
      select a.rtRank {
        when 1 {
          const c = a:Concrete(real, 1);
          foo(c);
        }
        when 2 {
          const c = a:Concrete(real, 2);
          foo(c);
        }
        when 3 {
          const c = a:Concrete(real, 3);
          foo(c);
        }
        otherwise {
          halt("unexpected rank in foo");
        }
      }
    }
    when myTypes.b {
      select a.rtRank {
        when 1 {
          const c = a:Concrete(bool, 1);
          foo(c);
        }
        when 2 {
          const c = a:Concrete(bool, 2);
          foo(c);
        }
        when 3 {
          const c = a:Concrete(bool, 3);
          foo(c);
        }
        otherwise {
          halt("unexpected rank in foo");
        }
      }
    }
    otherwise {
      halt("Unexpected type");
    }
  }
}

// This version of foo() takes a concrete class and does something
// taking advantage of its concrete-ness
//
proc foo(c: Concrete(?)) {
  writeln("In foo, c's type is: ", c.type:string);
  var val = c.getValue();
  writeln("In foo, c's value is: ", val, " : ", val.type:string);
  param p = c.r;
  writeln("And we can extract 'c.r' as a 'param' as well: ", p);
}

Does that help explain what I meant?

Now, some notes:

  • The case-by-case handling I'm doing here can obviously become annoying as the variety of types increases, however it's arguably necessary for performance and no different than what other statically typed languages would require. In C/C++, the C pre-processor can take some of this pain away, and if we find patterns that get old quickly, we should look into what we can do to clean it up. As an example, I believe I could rewrite the select over the param int value above as a param for loop in Chapel to tighten it up; and maybe we should have similar capabilities for the type options as well?

  • Here, I've done the breakdown of static cases fairly manually and explicitly in standalone functions. Clever OOP programmers are likely to come up with patterns that do this as methods instead, or to perhaps even avoid the need to do as much static casting using chains of dynamic dispatch (for example, while Arkouda was wrestling with this, @cassella and @mppf filed the following toy showing an alternate approach: arkouda/double-dispatch.chpl at master · mhmerrill/arkouda · GitHub ).

I'm hoping this turns some of my vague comments into something more concrete that you can work from based on your own needs and preferred coding conventions. But please keep asking questions if things are still unclear.

-Brad

1 Like

@JMM / @abhishekingit:

Now that I'm out of meetings, I was able to verify that my param/rank checks could've been done more neatly where rather than writing a case-by-case analysis like:

      select a.rtRank {
        when 1 {
          const c = a:Concrete(int, 1);
          foo(c);
        }
        when 2 {
          const c = a:Concrete(int, 2);
          foo(c);
        }
        when 3 {
          const c = a:Concrete(int, 3);
          foo(c);
        }
        otherwise {
          halt("unexpected rank in foo");
        }
      }

I could've / should've written it more simply as:

      var foundIt = false;
      for param r in 1..3 {
        if a.rtRank == r {
          const c = a:Concrete(bool, r);
          foo(c);
          foundIt = true;
        }
      }

where a param for-loop in Chapel is one in which the loop is completely unrolled so that each iteration has its own param (compile-time known) value of the index. So you can almost think of this as a macro expansion of the loop.

Getting back to my comment about "and maybe we should have similar capabilities for the type options as well?", we've talked for awhile about having loops over types as well, in which case maybe I could write the outer select like:

for type t in (int, real, bool) {
  ...
  for param r in 1..3 {
    ...
  }
}

If this would be a game-changer for you (or anyone reading this), I don't think it would be hard to implement, and that we've just been lacking the right motivation. But it could also be unnecessary depending on your code patterns or whether you take other approaches like the chained dynamic dispatch approach.

-Brad

1 Like

@bradcray
Wow, This works wonders for us.
It'll fit our design specs.
Also, this way of downcasting to the 'LessAbstract' subtype is much preferable.
Now I finally understand what you meant by writing "rank generic code, instantiating it for some distinct values, and then choosing between different instantiations based on the rank" on some forum for a Rank-independent code query.
I agree using the for loop for rank checks is immaculate. I guess something like this will work for our Units library.
Thank you for all the help really appreciate it! @bradcray

1 Like

@abhishekingit : Great, glad it was useful!

Just to make sure nothing is missed: in many contexts, when I talk about rank-neutral programming in Chapel, sometimes I'm talking about other features of Chapel, like the fact that this loop:

forall idx in D {
  A[idx] = ...;
}

can work for a 1D, 2D, 3D, 4D, or nD domain D and array A since Chapel supports tuple-based array indexing. Contrast this with languages that require a distinct index per dimension, requiring you to type different symbols for an array access depending on the number of dimensions:

1D: A[i]
2D: A[i][j] or A(i,j)
3D: A[i][j][k] or A(i,j,k)
etc.

Of course, these features can mix with what we've been talking about here, because it means if you have a concrete child class like this:

class Concrete: Abstract {
  param rank: int;
  type eltType;
  var D: domain(rank);
  var A: [D] eltType;
}

you can write methods that don't care what rank and eltType are used:

// Doesn't care what rank A is:
proc Concrete.increment() {
  forall a in A do
    a += 1;
}

Let us know if you have additional questions, of course,
-Brad

PS — (and if you're interested in introducing yourselves and/or your project, I'm sure people would be curious: Introduce Yourself!)

1 Like

Thanks a lot, @bradcray! This is a lot of information/examples for us to chew on, and it will really help us get off the starting block.

Once we have some code written and the overall design is in some decent shape, we look forward to introducing our project!

Joy

1 Like

Hi @bradcray, third member of the project here. Apologies about the mention, I'm kinda running into a thing, not sure if it's an issue.

So, Chapel will evaluate 10**-1 as 0 which does make sense since it's inferring both numbers as int. From a user's perspective, I would expect it to return 0.1. Also noticed this happens on any number with a negative exponent. Coercing the base to real beforehand makes it work as expected. So, 10.0**-1 gives 0.1 as expected.

However, I do feel the compiler should infer the base as real when the exponent is negative.

Where is there a precedent for this?

10**-1 = (1/10) and that operation should yield 0.

I am a user and for me, if all the operands are an integral type, the result should be integral.

Hmm, yeah, you are right. I was idiotically comparing it to Python. Makes sense now

@AsianIntel : It sounds like this is resolved, but just to confirm, Damian's right: Since Chapel is statically typed, we have to choose a type for operators based on the static types of variables rather than their dynamic values, and generally have chosen to make operators closed on their argument types (so t1 op t1 will generate a t1 value rather than a value of some other t2 type).

Given that, we'd typically suggest making one of the two arguments real if you want a floating point result as Damian suggests (which will cause the other integral argument to be coerced to real, calling the real op real overload).

The following is a little esoteric and I'd be hard-pressed to recommend it strongly (as I fear it will add more confusion than benefit), but just to be complete:

If someone really wanted this behavior, you could get it by defining your own ** overloads. For example, in the following program, I create my own ** overloads that always return real for int**int operations. In order for these to work, I needed to create two overloads: one that takes values known at compile-time (param values in Chapel) and another that takes values known only at execution time. If I don't overload both of these, I run the risk that some of my ** calls will call my routines and others will call the ones built into the language:

  operator **(param x: int, param y: int) {
    return x:real ** y:real;
  }
  operator **(x: int, y: int) {
    return x:real ** y:real;
  }

  writeln(10**-1);  // prints 0.1

The reason I can't particularly recommend this approach is that I think overloading provided operators is likely to be confusing to people reading your code who are familiar with the language and know how they are "typically" implemented. That is, if they didn't notice that you'd provided your own overloads, they'd be surprised that you were getting floating point values back from integer arguments. Also, I don't feel confident that the two overloads above cover all the cases that the built-in operators support (and, checking quickly, it seems they don't), where the result could be that some argument combinations call the built-in operators and others call yours, which could be confusing for you.

Taking that abuse a bit further, we could create three overloads where two of them take a compile-time-known exponent and return a real if it's < 0 or an int if it's > 0. Note that even though Chapel is statically typed, this "different return values on different input values" is possible since we're making the decision about what type to return based on compile-time (param) values.

  operator **(param x: int, param y: int) {
    if y < 0 then
      return x:real ** y:real;
    else
      return x ** y;
  }
  operator **(x: int, param y: int) {
    if y < 0 then
      return x:real ** y:real;
    else
      return x ** y;
  }
  operator **(x: int, y: int) {
    return x ** y;
  }

  writeln(10**-1);  // prints 0.1

In addition to the reservations I had about the previous approach, this one also has the asymmetry that you'll get a different result for 10**-1 than you would for var negone = -1; 10**negone; since the latter would call into the non-param version and return an int again.

All that said, if you weren't married to using an operator to express this, I think it would be very reasonable to create your own pow() or recip() routine (or whatever you want to call it) and have it take one of the above approaches to give you the behavior you wanted, e.g.:

proc pow(x: int, y: int) {
  ...
}

pow(10, -1);

or:

proc pow(x: int, param y: int) {
  if y < 0 then
    return x:real ** y: real;
  else
    return x ** y;
}

pow(10, -1);

or even:

proc recip(x: int) {
  return 1.0 / x;  // or x:real ** -1;
}
recip(10);

Of all of these approaches, I'd definitely suggest either Damian's original approach or creating your own routine for the case(s) you want to behave a certain way.

Hope this is helpful,
-Brad

My limited experience is that the exponentiation operator is overused in languages that have it I avoid it like the plague.

I appreciate that it is always a good idea to have the program read as an expression of the algorithm so there are times where exponentiation aids code clarity But, I have never seen where exponentiation by a small integral positive or negative power is needed. Indeed, it is most likely the cause of performance and accuracy concerns.

Can you give us some examples of the expressions in which you are using exponentiation so that we can have a look at whether there are better ways to write them.

My 2c.