So I want to explain why references are no good. Correct any mistakes please!
The problem, in summary, is that Chapel lies to the user and uses tricks to maintain the lies which cannot be sustained in the presence of parametric polymorphism.
Consider
var x : int = 1;
ref r : int = x;
If I understand correctly, x refers to an object with address I will write &x
, with an int
stored in the object. We say x
has type int
. On the other hand, r
is also a variable, with address &r
, but it actually contains a pointer to an int, a type I will write int*
. I will write *p
to refer to the value stored at address p. Now lets examine 4 scenarios.
R-context:
- An int value is expected, and x is written. Chapel copies x.
- An int pointer is expected and x is written. Chapel copies &x.
- A int value is expected and r is written. Chapel copies *r.
- An int pointer is expected, and r is written. Chapel copies r.
There are two more cases, when instead of x, an expression of type int is used. This may or may not require a temporary if an int is expected, but if an int pointer is expected, if that can happen then store has to be allocated somehow and its address copied. For example
ref rtemp : int = 1 + 2;
if allowed would require that. Before proceeding, there is a related issue: overloading. For some set of functions, with a parameter type int and one of the intents in
out
inout
I assume this means, copy value in the first case and copy pointer in the other two. The problem is, before we can say Chapel is expecting some type, we have to resolve overloads. But I'm going to skip considering that for the moment.
Now there is another scenario where a pointer is always required, and that is assignment. On the LHS of an assignment, a pointer is required.
L-context
5. So if x
is written, the pointer is &x
6. but if r
is used, the pointer is just r
So we now have six cases where the compiler has to cheat to maintain the lie it made to the user. I want to spell out the fiction:
the user expects x
and r
to have identical behaviour,
except that the underlying store of a variable declared as a reference may be shared with another variable or reference. The cheats 1 thru 6 maintain that requirement.
parametric polymorphism
Can we sustain this fiction, if instead of int
we have some unknown type T
? The answer is yes, provided T is restricted to be non-pointer value type. We need to consider classes. A value of class type is actually a pointer. So if c0
is of class type C, then there are two possible semantics (I don't know which Chapel implements):
semantic 1
var c1 = c0; // copies pointer
ref c2 = c0; // copies pointer
ref c3 = c1; // copies pointer
ref c4 = c2; // copies pointer
So now our rules 1-4 above are broken by classes. As assignment to c1
will not change the object c2
, c3
and c4
refer to.
We can sustain the rules however. For example
semantic 2
var c1 = c0; // copies pointer
ref c2 = c0; // sets c2 to &c1, double indirection now
so now c2
is a pointer to a pointer underneath, and if we assign a new value to c1
, the object referred to by c2
will also change. Since semantic #1 breaks the usual rules, with parametric polymorphism we have to add another caveat: T cannot be a class type either. Semantic #2 is already covered by the first caveat, provided we allow that a class type is a non-pointer type.
The reason for the exclusion is that it is not possible to maintain the cheats unless we can distinguish pointer to T from T. In other words the compiler still has to know a reference variable is a pointer to T, whilst a variable is an actual T. It cannot do this if T = pointer to U for some type U. If semantic #1 is used, then there are two cases for T: either it is a non-class non-pointer or a class type, and we have to know which because the semantics are different.
further problems
So it may seem, we can retain references by the rules above together with constraints on type variables that would hide something the compiler needs to know to maintain the lie. Unfortunately, there are more problems. The key one here is that functions can only have one argument. Multiple arguments are total nonsense. The type of a function is given by D->C
, where D is the domain and C is the codomain. If you want multiple arguments there are two ways to get them: pass a tuple typed argument, and, pattern match it inside the function to extract the individual components. The other method is to use higher order functions, which is what users of Haskell and Ocaml often do.
Why can't a function have multiple arguments? Because it is impossible to do even basic algebra. For example, the rule for composition of the functions
f: A -> B
g: C -> D
is that the composite g . f
requires B = C, and, the composite then exists and is given by
(g . f) (x:A):D { return g(f(x)); }
if a function has two arguments, composition is impossible. If you want to combine functions you now need a complex set of rules indexed by the numbers of arguments, instead of one single simple rule. In C++, heavy duty template meta-programming is required to obtain some generality (and this wasn't possible until recently).
So you basically have to go with passing a tuple as an argument. But now, tuples are first class types themselves and you can have a variable of tuple type:
var x : int * string = (42, "hello");
... f(42,"hello");
... f(x) ...
note I use the correct notation for tuple types. Both the calls to f
will work. It's no longer possible to have in
out
and inout
parameters (because, trivially, there is only really one parameter!).
All the components of a tuple value will be non-references or classes, because only a variable can be a reference. If you want the inout
and out
semantics, it's easy to get by explicitly using read-write and write-only pointers .. but you cannot do this if you only have references. the problem, essentially, is you now have TWO stages of binding: first, constructing the tuple, and then, passing it to the function. So the magic rules that maintain the fiction fall apart because they're only one step rules.
All these complications and issues just evaporate if you put pointer types into the language. It is best not to lie to the user, but you simply cannot lie to the compiler. The point is you already have pointer types in the type system. References can be maintained ONLY by syntactic sugar, since in that case, they might confuse the user .. but never the compiler.