I want to run a core type system calculus by people here. It's implemented and backed by algebra, the first bit is stock standard but i introduce some novel constructions bit later. It will take a few posts to present.
The collection of all basic types and functions forms a category I'll call TYPE
, by basic I'm talking shared values and excluding, for the moment, substructural type like borrowed or uniq classes. I'm also only considering structural types, most languages have at least some nominally typed things, for example a C struct is a nominally types version of a tuple.
There are two basic constructions: product and coproduct formation, these are families of data functors TYPE^N -> N for finite natural numbers N. A functor is a structure preserving map, structure is given by function composition, so the rule is F(f * g) = F (f) * F(g).
Products are also called tuples, in set theory products are your usual cartesian product with elements being tuples. In Chapel and Felix and Ocaml its .. a tuple The standard notation is like int * string * real
using a non-associative infix operator. You can also use tuple(int,string,real)
. Note Chapels notation is not acceptable (int, string,real)
is an element of TYPE^3, that is, its a tuple of types, not a type which is a tuple.
Coproducts are also called sums or variants. They represent alternatives. Chapel enums are variants but they're deficient because they lack arguments. In Felix I use the standard notation like int + long + real
. I think Felix is the only programming language to have structural sums.
The two operations are required to be distributive, that is
X * ( A + B ) === X * A + X * B
which means these types are isomorphic (NOT equal). Product and sum formation is NOT associative. We all know for example (1,(2,3)) is isomorphic to ((1,2),3) but the types are not equal. In sets, a sum is just a discriminated union.
There is an empty sum, the set of no elements, it is unique, and it is usually called void and notated 0.
There is a canonical empty product, which is a type with only one value if you like, it is the empty tuple () and the type is called unit. In Chapel its called nothing. It is notated 1.
So far, everything is very basic. I'm going to add some syntactic sugar now which I will remove almost immediately. I will write 2 = 1 + 1, and 3 = 1 + 1 + 1. I call these unitsums (because they're sums of units). Note that because they're not associative, we have that 3 + 2 is isomorphic to 5 but not equal. Similarly 3 * 2 is isomorphic to 6 but not equal.
I'm going to add sugar now. If you have a type T * T * T I'm going to allow you to write T ^ 3. Note Chapel gets this wrong and it must be fixed. Now this is just syntactic sugar. For sums, T + T + T will be written 3 \* T
. Note, I cannot use * for this because a repeated sum is isomorphic to a pair, (3,T) but not equal.
Now, I'm going to throw out the sugar and replace it by a real, actual, type constructor. Exponentiation is usually written in two ways: either C ^ D, or D->C. The latter is obviously a function type. The former is logically equivalent but I'm going to reserve that notation for arrays. Chapel uses [D] C for arrays which is acceptable. It also uses ** for exponentiation which is also acceptable but ugly. The literature almost universally uses ^.
Now, we can describe types consisting of products, sums, and arrays, using the usual mathematical index notation, remembering that the operator are not associative, so parentheses matter. However moving the parens around produces an isomorphic type.
Now, clearly the domain of an array can't be any old type. I'm now going to define a constraint which specifies what it can be. Let 0 and 1 be compact linear types, then every sum, product, or exponential of a compact linear type is compact linear.
Any compact linear type can be an array index. Let me show an key example:
real ^ (3 * 5) === (real^5)^3
What does this say? It says a matrix which accepts a pair of values in range 0..<3 and 0..<5 as an index, is isomorphic but not equal to an array of 3 arrays of length 5. Note again, 3 and 5 are types NOT integers. So values of these types are used for indexing and bounds checking is not only not required, it isn't even possible. The indices cannot be out of range.
Another formula is:
real ^ 3 * real ^ 5 === real ^ (3 + 5)
The says a tuple of two arrays length 3 and 5 respectively is isomorphic to a single array indexed by a conditional which selects either the first or second array and then choses the index.
The type system is very powerful in describing shapes. Now I am going to talk representations. First we introduce a function size on compact linear types, which returns a natural number. Unsurprisingly, size(5)=5, and in fact, if you evaluate the formula as if it we numbers not types, that's your size!
There's a hidden piece of magic. The types, mapped by size to natural numbers, is associative! Now on to the encoding. For products, a type like 5 * 3 * 2
is represented by a number in the range 0 to 29 and we use the well known variadic base representation. A projection to find the second component of the product is just given by x / 2 % 5. Its the same as bits, except right shift is replaced by division and mask is replaced by modulus. The format is compact because all values in the range 0 to size - 1 are used up, and linear because .. well its an integer
The encoding of sums is similar, except you have to add and multiple to construct, and reverse the operations to extract.
Now, suppose you have a matrix and you want to perform an operation on rows. You have to write two loops to do it. But there is a better way: you reshape the matrix into and array of arrays, which is a type cast only, the storage layout doesn't change, and now you only need one loop.
All Chapel programmers will understand the idea, in Chapel, you just define ranges for the array domain, and Chapel interprets them as iterators. In C, you just use formulas like mx + b
. But with my calculus, you don't need to do any of this! In Chapel as I understand it, a range specification for a domain is a value of the range type and is associated with an iterator. But in my calculus, the index is a type. Every different shape is a different type. So the compiler can generate the ranges automatically. All you need to do is apply the desired isomomorphisms.
The calculations are simple and can also be done at run time (with the same code actually!). In particular, given a kind I will call COMPACTLINEAR
which can be applied as a constraint to a type variable, you can do calculations in which you do not even know the type fully. In other words, you can perform calculations without knowing the rank of a matrix. The compiler (or run time!) fixes the variables when known and turns the symbolic calculations into actual numbers.
The calculus is thus amenable to parametric polymorphism with the constraint of compact linearity imposed on type variables. NOT all computations you want to do can be done with isomorphisms. But I will bet that a lot of the ones that can be parallelised can be. The really nice property here is that you can take an array shape and linearise it, split the array up into equal pieces irrespective of the shape, farm them out to different machines, and then take the computations and divide the index sets up in a way that would be completely unnatural to a programmer, and it will still work.