Design input for units library

Hey Chapel community,

We are looking to create a units (units of measurement) library in Chapel. The aim is to avoid string allocations and leave the type checking onto the compiler. We're looking for inputs on the best way to design such a library.

Two ways that we can see are:

class unit {
        param length: int;
        param mass: int;
        param time: int;
        param electric_current: int;
        param temperature: int;
        param substance: int;
        param luminous_intensity: int;
}

class length: unit {
        var _value: real;
        var coefficient: real;
        var constant: real;
}

class meter: length {}

Or the other way would be

// Same unit class as above
class length: unit {
        type length_kind;
}

class meter {} 
// Then the variable would be of type length<meter>

Also, what would be a decent way of designing derived units (e.g., area, volume, velocity)?

CC: @JMM @abhishekingit

I think it'd be great to have a units library but I'm not quite sure how it would be implemented. For one thing, we don't have user-defined implicit conversions available today for user types. I'm curious if you'd view that as a requirement.

Regarding your examples, I'd suggest you use record for these types since storing an int/real/etc on the heap seems like it might be more than we want if we can find another way to do that. But, records don't support inheritance, so if you wanted to bundle up something like class unit above it could be a field.

I think it'd help the discussion if you showed an example computation that the units library is to support & describe how you imagine that computation would be handled.

-michael

Hello!

Based purely on the names of the fields, it seems like you probably
wouldn't want all the fields in use at a particular time? At least, my
guess is that a single unit instance is likely a length or a mass,
rather than both a length and a mass? But an example like Michael
suggested would probably help clarify. I do think the concept of a
units library would be great to have and like the named units you're
including so far.

Lydia

Right, apologies, I left out a lot of information in the initial post. The fields in the unit class are meant to denote the powers of the seven basic dimensions.

For e.g., any quantity of type area would have the param length equal to 2 and the rest to zero. Similarly, velocity would have length equal to 1 and time equal to -1. (I hope that makes sense :stuck_out_tongue:)

The three fields in the length class come from:
value in SI unit = value * coefficient + constant
For example,
For 10 centimeters, it would be value in m = 10 * 0.01 + 0 or for 10 degrees centigrade, it would be value in K = 10 * 1 + 273.15.
I think this type of design would allow us to easily convert values between non-base SI units (like from centimeter to kilometer or nanometer.

The above was just a sort of proof of concept that we tried. What we are first aiming for would be:

var addition: length = new meter(10) + new centimeter(10); // value would return 10.1
var addition2: length = new centimeter(10) + new meter(10); // value would return 1010
var subtraction: length = new meter(10) - new centimeter(10); // value would return 9.9
var subtraction2: length = new centimeter(1000) - new meter(1); // value would return 900

// Something to achieve for later on
var area /* unclear about the type for this */ = new meter(10) * new centimeter(10);

In short, what we expect the library to handle initially:

  1. Addition
  2. Subtraction
  3. Conversion

Please don't use the name "unit". In type theory, this is the type Chapel calls "nothing", the type of an empty tuple. More generally a unit is a significant term in algebra and should not be usurped. Dimension might be a better name?

Conversion from non-MKS units can be done with an ordinary function eg: feet_to_metres as is not an issue and should always require an explicit function call, it should never be implicit.

Now for the real problem. This has been done in Ocaml using phantom types. I do not think Chapel has a strong enough type system for any possible user implementation. The method used requires parametric polymorphism. One needs versions of all math functions so that for example x + y is type checked, to ensure the units of each are the same. On the other hand, x * y will add the counters for each unit. Evaluating a polynomial of arbitrary length is extremely challenging, the coefficients have to cancel out the units of an exponential, eg if x is in metres then x^2 is square metres and dx^2 requires d to be 1/m^2 i.e. per square meter unit to cancel the square so the polynomial is dimensionless.

From memory, the method is based on numbers represented like SSSSZ being 4. In Ocaml the variant

type num = S of num | Z

is used. The advantage of this is that recursion can be used for construction so S x means to add 1. Pattern matching can be used for subtraction:

match x with | S y -> y | assert false

Now we have a type where we can add or subtract one, to add or subtract more than one we just use a recursive function.

So far, this is only a useless type. To use it we do

type 'a dimreal = Metres of float * 'a
type squared = S(S(Z))
Now we can write

Metres (42.0, squared)

or something like that. This uses a real value, not a phantom so I haven't quite got it correct. The only other way to do this is to build it into the compiler.

Regarding expressing the units, other than new meter(10) to indicate 10 meters, we could write it with a type cast, as in, 10:meters, or with a free function that returns some type e.g. meters(10) (with the caveat that the type and the function would probably need different names). I find the type cast approach appealing because it matches the typical use of saying the value and then the unit (e.g. in conversation I would say "10 meters"). The main thing to be cautious of there is to check that the operator precedence table will be compatible with this idea. The table Expressions — Chapel Documentation 1.25 shows the operator precedence and we can see that : has higher precedence than all the arithmetic operations including **. There might be an issue with left-associativity vs right-associativity though but I'm not aware of one off hand.

I agree that there are probably challenges with the type system in Chapel today for doing this - especially for implicit unit conversions.

But, I would expect it would be possible (and probably not require any language changes at all) to type check the units that a programmer has written. I also would imagine it is not that hard to handle the math operations between the types.

The implicit conversions I am talking about are a case like this:

proc f(in value: meters) { }
var cm = 10: centimeters;
f(cm);

Here you might expect that on the call to f, the centimeter value is converted to meters. That's going to be hard to implement. However, it's not that hard to arrange for the program not to type check (in other words, not to compile) unless the programmer in the above case adds an explicit conversion, e.g. f(cm: meters).

In contrast, we can implement the various operators (like =, ==, +, * etc) to handle doing conversions as needed. I don't know of a problem with this. Even for having * potentially generate a new type, that is not really a big deal, as long as we can encode the units into some param values.

Saying that, I am imagining that in the earlier example,

var addition: length = new meter(10) + new centimeter(10);

it would be necessary for length to be a generic type, and the particular unit would be inferred as an instantiation of that generic type. If the user wanted to have a particular unit, they would need

var addition: meter = new meter(10) + new centimeter(10);

(or something).

Thanks @mppf for the inputs. Our initial aim is to get an implementation that is a little more restrictive (and therefore probably easier to implement).

We don't aim to support implicit type conversions as of now, and in our current use cases we simply convert to the required units explicitly as and when needed.

The way some implementations support arithmetic is to convert LHS/RHS to "base units" which would
be (say) SI units and then perform arithmetic.

@AsianIntel I'm also wondering whether we need explicit notions of length etc., it might be
simpler for each param to have an associated _value, coeff etc., which gets populated only
if the corresponding unit is non-zero.

Then all objects would be of type unit and if you want to pretty print you could always write a function/method to do that. I agree it is less elegant but it will get us off the block faster.

@mppf any thoughts on the above?

That sounds like a good starting point. So, a way to summarize the goal of the effort, is this: have a library where the programmer uses explicit units but where the program won't compile if there is an error in the units.

Always doing arithmetic in SI units (and then converting again if necessary) seems reasonable too me as well.

I didn't quite follow the part about _value coeff etc but maybe it wasn't meant for me anyway. For me, it's easier to talk in terms of code samples / sketches in code.

Regarding this --

It seems like something we should maybe talk about more? Personally, to talk about a numeric value with an attached unit (say, meters), what makes sense to me is to talk about meters as the type. But that might be unworkable when we get to core complex units (e.g. m/s). But, having the type literally be unit seems odd to me because it's not a unit itself - it's a number with a unit. Is a type name of unit common in other implementations? Also, I'm not sure if you're talking about some kind of record/class collecting information about the unit used, which is then combined with a number to form something else? (If so that would be more of an implementation detail and not something that needs as much API design attention).

Lets suppose we have only metres and seconds and only + - * and / and we're only using real. Excuse my incorrect Chapel too please. I'll just use pseudo code. The dimensions are encoded in a pair of compile time integers and the formulas for + and - require equality and for * do addition and for / subtraction of the dimensions. Now you write:

var dist = (2.0, (1,0)); var time = (4.0, (1,1));
var speed = dist / time;

and we expect the result to be (0.4,(1,-1)), which is metres/second. And if we write

dist + time

we expect a compile time error because the dimensions have to be equal for addition.

It is clear this cannot be done with a class "dimension" whose values are the pairs shown above, at best that would allow a run time check which is completely unacceptable because a formula in a loop would be pointlessly checked every iteration. So instead the pairs shown have to be types. And I mean, each and every combination of integers must be a distinct type. So metres, seconds, metres * metres, metres/seconds etc etc are all distinct types.

In C++ we could make these types:

template<int d, int t> dim;
using metres = dim<1,0>;
using seconds = dim<0,1>;
using m_per_s = dim<1,-1>;

and now, you could use template metaprogramming to do the required maths. Now you can overload + - * and / to do both real arithmetic at run time and type calculations at compile time. For example an overloaded + is easy, the types of the dimensioned real numbers just have to agree. For * the type of the result is the product of the reals and the sum of the dimensions.

So the point is, you need parametric polymorphism to represent the dimensions, and you need to be able to do the equivalent of template metaprogramming to do the type calculations. After that, it is not so hard, just messy, to overload EVERY mathematical formula in the whole system to have dimensions.

BTW: "dimensions" is a bad name too because that applies to array extents.

I suspect Chapel cannot do the dimension stuff, so the question would be whether the compiler type system can be extended to allow it to be user encoded. As it happens my language Felix has compile time "integer" kinds which support addition. Here's a working example:

fun join[T, N:UNITSUM, M:UNITSUM] 
(x:array[T, N]) (y:array[T, M]):array[T, N `+ M] = {...

This says that when you join two arrays of length N and M you get an array of length N + M. The operator `+ shown is a type operator, encoded in the compiler, the UNITSUM constraint is a kinding constraint that ensures the types N and M can be added. A unit sum is a sum of units, for example bool = 2 is a sum of two units. It's basically a compile time integer.

So to make the MKS units stuff work in Chapel you need at least an advanced type system for which there is an integer kind which allows something like integers to act as types, you need to be able to do simple maths with these types at compile time, and you need a kinding system to constrain the types you want to add to the integral kind. You also need overloading and type assertions to catch errors.

Chapel's arrays' sizes aren't part of their static types, but tuples' sizes are, so here's a Chapel routine that takes in two tuples and returns a tuple whose size is the concatenation of the other two. Note that TIO is several versions behind due to health issues on the part of the maintainer, so for this code to work with a recent Chapel release, the indexing on the tuples would need to change from 1-based to 0-based.

I'm reading this thread for the first time only very quickly tonight on my way out the door and don't know Felix, so may be missing some obvious difference between your assertion of what Chapel can and can't do and reality.

-Brad

Note my example from Felix happens to use array sizes just because I know I happen to have an example of compile time type calculation of the kind needed to implement the MKS units idea. Which is a good idea to be able to implement.

I hope I understand but the Chapel routine shown is adding the sizes at run time?
That's not the requirement though. Clearly, the compiler is adding the tuple sizes at compile time. But that's not enough. The user has to be able to add types.

Chapel has the isTuple thing which is a kinding constraint. So the compiler has capabilities: the problem is making them accessible to the programer to write meta-programs that calculate types at compile time. In my example code from Felix, the user wrote the calculation array[T, N `+ M] which is computed at compile time. Note the inputs N and M are type variables. The computation is only resolved on monomorphisation. However the computation is kind checked polymorphically (before monomorphisation).

For the OP's issue, addition and subtraction is a start, but you also need types like 4 and 7 as well as 0 = void, 1 = unit (aka nothing), 2 = bool.

The resolution of the computation is performed when top level monomorphic constructions instantiate generics based on dependencies. However the formulas have to be already kind checked (in the same way run time code can't evaluate until after it is type checked).

No, it's adding them at compile-time, which is why the compilerWarning() is there—this is a warning that is printed at compile- rather than execution-time, showing that the value is known to the compiler.

Chapel has a notion of param values which are constants whose values are known at compile-time. Functions that return param values or types are evaluated at compile-time. The + overload that takes two param ints and returns a param int is such an example. The sizes of tuples in Chapel (and the dimensions, but not sizes, of arrays) are param values, so adding their values invokes the compile-time implementation of + and calculates the sum at compile-time.

If the result of the sum were stored in a var or const (a variable or constant not known until execution time), that would be fine, but then it could not be used to declare a new tuple because the size of a tuple must be a param value. If a non-param value (like the sum of two integers read from the console or of two array sizes) is stored into a param, the compiler will complain that the value is not known at compile-time.

-Brad

Yes, but it's evaluating constant integers. My computation is evaluating type variables. For the dimension problem you need a type like metres<N> and be able to define the type of a calculation like A * B where A:metres<N> and B:metres<M> as metres<N + M>. You can obviously do the computation inside the compiler easily, if you write some C++. The problem is to allow the user to specify the type calculations. So you need to embed user programmable type calculator inside the compiler.

In any case if Chapel can do it I'd love to see a metres calculator. If you can then do metres and seconds, then the whole 7 MKS dimensions could be done too. The test would be to say evaluate mx+b, where m,x and b have some powers of metres and seconds, and the calculation would barf if the dimensions didn't agree.

Yes, but it's evaluating constant integers. My computation is evaluating type variables.

Again, I've only read this conversation quickly, am not expert on unit libraries, and don't know Felix, so was only trying to point out that Chapel has the ability to do some degree of computation at compile-time. I'm not expert enough in unit libraries to say whether what Chapel has is sufficient for what is needed to reproduce an OCaml-/Felix-like approach or not. If the answer turns out to be "it doesn't", that wouldn't necessarily surprise me, as our primary goal in its support for compile-time computation was to support things we anticipated needing for scalable parallel computations. But then I'd also be curious for ideas for ways in which Chapel could be extended in a natural way to provide that capability.

Here's an example of me riffing on various Chapel functions that compute types, and on types, at compile-time en route to signing off for the evening.

-Brad

1 Like

Maybe you can do it. Here is a partial implementation in C++. It works. It barfs if I make a type error.

#include <iostream>
template<int m, int s> struct MKS { double x; };

template<int m, int s>
MKS<m,s> operator +(MKS<m,s> a, MKS<m,s> b) { 
  return MKS<m,s>{a.x+b.x}; 
}

template<int m1, int s1, int m2, int s2>
MKS<m1+m2,s1+s2> operator *(MKS<m1,s1> a, MKS<m2,s2> b) { 
  return MKS<m1+m2,s1+s2>{a.x*b.x}; 
}

int main() {
  MKS<1,0> x0{2.0}; // metres
  MKS<1,-1> v{0.7}; // metres per second
  MKS<0,1> t{21.0}; // seconds
  MKS<1,0> d = x0 + v * t;
  ::std::cout << "Distance " << d.x << ::std::endl;
  return 0;
}

Easy to do subtraction and division and extend the system to 7 parameters.

The main problem in C++ is that generic formulas (i.e. ones for which the template parameters are not constants) cannot be type checked, they're checked only on instantiation. So for example if you lifted the computation out of the mainline into a function and made it calculate a + v * d with type variables for the dimensions, it wouldn't be type checked polymorphically, only the instances would be checked. In Felix, all polymorphic stuff is type checked so that every valid instantiation is guaranteed to be type correct.

I don't know what Chapel can do in this regard.

Thanks @skaller and @bradcray for this discussion. Our current aim is to get something to work for us, and if it turns out that we can't do compile time checking, that's fine. Unit libraries have been developed and used extensively in Python, where the notion of compile time checking itself is shaky.

The resulting library may be less robust to programmer errors, but our priority is to write something we can use in the coming months. We will follow this discussion and if turns out there are ways to do compile time checks, we would definitely aim to use it.

1 Like

@mppf @skaller I agree that the name "unit" does not make much sense.

In the pint library, the resulting object is called a Quantity which has some associated unit, which seems more acceptable. Does that make more sense to you?

Ah, that makes more sense to me thanks, and actually makes the original post by @AsianIntel also make more sense to me than it had before. My intuition had been to go the route of:

enum unitType { meters, seconds };

record ValueWithUnits {
  var val: real;
  param units: unitType;

  proc writeThis(s) {
    s.writeln(val, " ", units);  // should write "0.0 meters", say
  }

  operator +(x: unitType(?ux), y: unitType(?uy)) {
    if ux == uy then
      return new unitType(val=x.val + y.val, units = ux);
    else
      compilerError("unit mismatch in + operator:  " + ux:string + " != " + uy.string);
  }

  // etc.
}

which would support type-safe addition and subtraction, but I wasn't sure how to handle division and the like. But seeing the notion of having an int param field per unit type and using the value to store the exponent on the unit makes sense to me. Again, not an area I've worked in before. And re-reading the thread with this understanding, I'm realizing that this was explained by @AsianIntel here and that I was reading too quickly last night and reacting to elements in the tail of the thread that seemed potentially off-base to me. Sorry about that.

Like Michael, I think a record rather than a class makes sense for this type since it is designed to provide value-like semantics rather than referential semantics. You can also see that my intuition was also not to shy away from unit-based names in my identifiers because I think of the typical audience for Chapel as being the scientific/applied programmer rather than the PL expert, so I think it makes sense to meet them where they are.

they're checked only on instantiation

I think that, like traditional C++, we'd also only get errors upon post-resolution instantiations of operators on such types today, though we're pursuing an interface concept that, like C++, would aim to do the type checking more proactively. But it would still require an explicit instantiation at some point to get right.

-Brad

The only reason I don't like unit is that in algebra a unit in a structure has a special meaning. In a group, the unit is the identity element. In category theory, a unit object is any object 1, such that for any object X there is a unique map X->1. In programming the canonical unit is the empty tuple. Unit types have a single value. Etc etc. Since programming, and particularly programming languages, are based on algebra .. or should be .. usurping such a fundamental name would make discussion difficult. It's bad enough C++ has abused important names like functor which in C++ has nothing to do with functors at all. At least vector has some justification. If you called it SIUnit it would be fine. Quantity is not quite there, 2 is a dimensionless quantity. Having said that, it also implies quantities can be dimensioned .. arrghh :slight_smile:

My main concern here is that Chapel should have a kind of polymorphism where polymorphic functions can be type checked polymorphically. C++ cannot do this, and indeed, with dependent name lookup .. looking up a name at the point of instantiation of a generic, it is actually unsound, which is unforgivable. Instantiations should never depend on context.

I'm actually partially responsible for this in C++. Sigh. We tried to make it sane by making the lookup find the context in which an argument type was defined, the idea being the result would be the same for any instantiation, but unfortunately that didn't cover the case where a dependent name really was introduced in the instantiation context. In particular it could never cover the case where the argument was a primitive like int since that doesn't have a context of definition.