Ref members of class/record

Came across the message today:

"References cannot be members of classes or records yet."

Is there a status for this feature? Github issue?

It would be nice for constructing large custom data structures (trees).

The github issue is Timeline for ref Fields in Classes/Records? · Issue #8481 · chapel-lang/chapel · GitHub, which is pretty old and hasn't been updated in awhile. I think you should comment your use case there.

-Jade

2 Likes

Hi Jared —

I'd really like ref fields in classes and records as well—of the language features that I'd consider to be "missing" today, I think it's priority #1 on my list.

To try and avoid having you be blocked by this feature, though, I wanted to note that for large custom data structures like trees, the typical approach in Chapel would be to use class fields to have one object point to another. I think even if there were ref fields, this would be a preferable way to write such data structures because classes provide options that can lead to better discipline for memory management (classes can be owned, shared, or borrowed in addition to unmanaged) and nilability (class variables can be declared such that nil is or is not a legal value). In contrast, refs are a bit more restricted while also not providing memory management guarantees beyond what the compiler can prove statically.

An example of using classes in this way that quickly comes to mind is the "binary trees" benchmark from the computer language benchmark game, where a fairly straightforward version is available here: binary-trees Chapel #4 program (Benchmarks Game)

Note that this uses the "least safe" class-based option in that the class fields are nilable (to support leaf nodes with no children) and unmanaged (to meet the benchmark's requirements about how memory is freed, I believe? In production codes, owned would probably be a nicer choice).

If we had ref fields, I think it would be challenging to write this benchmark's tree pattern to use them because Chapel doesn't have a notion of nil/NULL refs which would be necessary to represent the leaf nodes. There might be ways to work around this, such as creating a sentinel nulltreeNode object that all leaf nodes referred to, but then that leads to a challenge about what its children would point to… (in the sense that I'm not confident we'd be able to create an object whose ref fields referred to itself). Classes seem more naturally designed for this kind of pattern in Chapel to me.

If there are other patterns or classes that would help show how they could be used for such data structures, please let us know.

-Brad

Thank you for your reply. This was good to help me think about what I want.

I think a good example of this could be a k-d tree. It’s a binary tree where nodes represent different dimensional values in a k-dimensional input set (a point cloud or something). If this point cloud were very large, it’s likely going to be generated outside the KDTree class (or should be), or read in from a file. But at the end of the day, the resultant data remains a multidimensional array. Moreover, a lightweight tree doesn’t necessarily consume the data, but rather refers to it (tree nodes could store indices of the points, rather than copying point values).

I see a few workarounds to the ref-in-a-class issue:

  1. wrap the array in a dedicated Data class. but then the Data class has the same issue with copying. But at least this class could have the dedicated method to generate/read the data somehow, though that puts the burden of data generation on the implementation
  2. move the memory from the outside array into the class directly, rather than copy (MemMove?). but then if the dataset is needed or used outside this would be an undesirable side-effect
  3. drop the classes, and have a purely functional interface to the tree. More book-keeping for the user though and can lead to a messier API
  4. use a c_ptr
  5. as is probably appropriate in 90% of cases, copy and be done with it haha

What do you think?

Hi Jared —

When you say "has the same issue with copying" is your concern that class assignments would result in deep copies in Chapel? They don't, so for example, code like the following:

class Data {
  var buff: [1..n, 1..n, 1..n] real;

  proc init(filename: string) { … }

  proc init(data: [1..n, 1..n, 1..n] real) { … }
}

var myData = new shared Data(filename="infile.dat");  // or maybe we'd want to use `owned` and have the following references to `myData` `.borrow()` from it.
var bradsData = myData;

class Node {
  var d: Data;
}

var nd = new Node(d = myData);

only creates one n^3 array, created by the new Data(…) expression. The other class declarations (bradsData, nd.d) are simply pointing to that same object, so not creating their own n**3 arrays. In this sense, classes are referential by nature. In contrast, Chapel records use deep-copy semantics by default and would result in three n^3 arrays if we were to replace class with record above (and remove the shared keyword).

In cases where users do want a deep-copy of classes, one way to do that is to create an explicit copy method—e.g., proc Data.copy() { return new Data(data = this.data; } and then invoke it explicitly (e.g., var jaredsData = bradsData.copy();).

If I've misunderstood what you meant by "the same issue with copying", please help me understand what you meant—this was my best guess.

Thanks,
-Brad

PS — Here's an executable example of the code sketched out above that demonstrates the aliasing by assigning to the original class and seeing the change reflected in the others:

Hi Brad,

Sorry for the confusion. When I say, “has the same issue with copying”, it’s that the Data class would still need to handle copying/moving of the input array to initialize buff I think? I tried to make an example below. If we read from a file within the Data class we get around that though. But for a Node library it’d be hard to anticipate all the ways a user might want to load data in to initialize buff.

class Data {
  // pretty sure I'd have to do this with the domain?
  var buffDom: domain = {1..0}; 
  var buff: [buffDom] real;

  proc init(ref buff: [?D] real, in copy: bool=false) {
    buffDom = D;
    if copy {
      this.buff = buff; // creates a deep copy?
    }
    else {
      // use swap to perform a move? might not be the best way, buff has to be filled first. ideally the formal argument would be left empty after
      this.buff = 0.0;
      this.buff <=> buff; 
    }
    
  }
}

var n = 100000000;
var dataBuff: [1..n] real = [...];
var myData = new shared Data(dataBuff);
var bradsData = myData;

class Node {
  var d: Data;
}

var nd = new Node(d = myData);

That’s good to know though that deep copies are not performed when passing myData to bradsData or nd.

Design-wise, I’m getting more unsatisfied with having a reference member field because it violates the encapsulation idea, could be a bit of a headache, all for the sake of avoiding a copy. Maybe providing the user a move and copy option would be better.

The good news is that we do have an option for that today, though I don’t know that it’s been as discussed as the ref field idea. It’s called the MemMove library, if I’m correctly understanding what you’re thinking of.

Lydia

Hello Lydia, thank you this looks like a good way to provide that type of initialization

[edited to fix dumb name mix-up on my part. Apparently I was awake enough to code but not to interact with humans; and then a second time to fix the broken ATO link]

Hi Jared —

Thanks for clarifying what you meant and apologies for the late response. I didn't get much time online yesterday.

I believe that what you want can be accomplished without using MemMove, though that's a good tool to keep in mind as well. Specifically, when arrays that are at the end of their lives are used to initialize other arrays, the memory from the initializing array can be used ("stolen") by the new array.

The keys to achieving this, IIRC, are (a) to use an in intent on the class initializer to indicate that it should get its own copy of the array rather than referring to an existing array and (b) making sure that the initializing array is obviously dead by making it a variable with fixed scope (i.e., not a module-level symbol) and ensuring there are no references to it after passing it into the new expression. Here's an example that demonstrates this, checking the address of the initial element of the array as a check that we're using the same array throughout [ATO]:

use CTypes;

config const n = 2;

class Data {
  var buff: [1..n, 1..n, 1..n] real;

  proc init(in data: [1..n, 1..n, 1..n] real) {
    this.buff = data;
  }
}

class Node {
  var d: shared Data;
}

proc main() {
  var origData: [1..n, 1..n, 1..n] real = [(i,j,k) in {1..n, 1..n, 1..n}] i*100 + j + k/10.0;
  writeln("origData lives at ", c_ptrTo(origData[1, 1, 1]));

  var myData = new shared Data(data=origData);  // or maybe we'd want to use `owned` and have the following references to `myData` `.borrow()` from it.
  writeln("myData's buffer lives at ", c_ptrTo(myData.buff[1, 1, 1]));

  var bradsData = myData;
  writeln("bradsData's buffer lives at ", c_ptrTo(bradsData.buff[1, 1, 1]));

  var nd = new Node(d = myData);
  writeln("nd's buffer lives at ", c_ptrTo(nd.d.buff[1, 1, 1]));
}

I haven't tried a variation that uses different sized arrays for different instances of Data, but imagine that can be made to work without MemMove as well. I'll largely be offline the next few weeks but can wrestle with that more in the new year if it doesn't fall out easily/obviously.

-Brad

This needs no immediate response, but wanted to acknowledge the reply. Happy holidays all

Thank you Brad,

That's good to know about how to get the copy elided, I think I'll go with that as it allows the user the choice without having to pass some `doCopy=true` to the constructor.

I do think there's still a use case for ref class members. Coming from doing data analysis at work, it'd be handy for pipelines where the data will outlive the operations performed on it. For instance, load data, create a handy class to operate on it or query it, once done throw that class object away, do something else with the same data, etc. Maybe that's a good case for functional programming over classes, or moving the data back out of the class once those operations are complete. But for more complex derived data structures a record or class with a ref member could be convenient

Best,

Jared

1 Like

Glad it was helpful Jared!

I completely agree—I find myself wanting the feature with some regularity, as do other users. I.e., none of this discussion negates my original, immediate reaction:

Happy 2026!
-Brad