Integer Promotion Weirdness

With generic literals, the following (with Brad's special version of chpl which supports real(16) types)

param b = bias(real(16)): // will have uint(16) type
param i = b + 70_000; // 70_000 > max(uint(16))

should generate an error at best, or at worst a warning and generate

param i = b + 70000:(b.type);

Sorry, I forgot that,

@damianmoz: A few quick questions:

  • When you refer to porting code from Fortran/C/C++, what language are you porting bias() from, and what does its prototype / function header look like as written in that language?

  • Are you advocating for numeric literals (like 42) and param values to be treated differently in Chapel?

Thanks,
-Brad

I am not porting bias() from Fortran or C or C++. Strictly speaking, I should really say either of

param b = x.bias;
param b = real(w).bias

The bias is a characteristic of of a w-bit floating point number. But both Fortran and C and C++. In the former, I had a smart pre-processor so bias() looks a bit llike the exponent(x) routine of Fortran except that it returns the bias on the exponent of a floating point number's exponent. In C, it was a hairy bit of #defines based on sizeof(x) and static const declarations. For the Chapel code, refer to transmute() doc in #28188, One of the things which attracted me to Chapel was that any grubbiness or complexity to insert such characteristics in my code disappeared with the param concept.

A param in Chapel is a far more powerful concept than a generic literal so they already are different. It is one of Chapel's features, like constexpr of C and C++ or a more flexible static const. Once you declare a param to be some literal value and that literal crosses that assignment within the declaration, it gets given a type and a name. This is not the case for the literal 0.5 below

param t = 1.0:real(32);
param x = 0.5;
param y = t + 0.5;

That 0.5 as used in x and is different to that used in y.

Miichael's smart solution, which I suspect was a lot of work, solved the problem for real(w) types even with that decision to default a literal param to be the largest supported size of the underlying generic type. I have to go back to fundamentals to explain a way on how to solve the problem for integral types. Maybe he can do the same for int(w) types but otherwise, the cleanest solution is to provide a compiler flag to allow true generic literal handing which solves today's issues and future issues like literals for use with real(128) and int(128) expressions. An alternative are rules like what C uses for

const int k = 10l
auto j = k + 1ULL;

Here, j has the type of k. The rule C uses when k is short is not generic so it is an incomplete solution.

Otherwise, I am stuck with the fallback of the the ugliness and poor readability of always qualifying an integer literal with its type. I thought the latter approach was a thing of the past. It is a really ugly thing to be teaching students.

By the way, the approach I suggested is nothing novel. C uses exactly the same approach with the constant INFINITY and has for decades. Other languages have used the same concept for literals although none with the parallel productivity of Chapel.

Hi Damian —

A param in Chapel is a far more powerful concept than a generic literal so they already are different.

Let me ask my second question a different way. Given:

param x = 0.5;
proc y() param do return 0.5;

In a given expression context, would you expect:

  • 0.5
  • x
  • y()

to ever behave differently, or to always to behave the same?

-Brad

Let's say

0.5_333_333_333_333
param x = 0.5_333_333_333_333;
proc y() param do return 0.5_333_333_333_333;

Given that whenever I see an identifier, I assume it is typed, e.g. even in C, I try and discipline myself to do

#define _B ((uint32_t) 127))

That is, I do not have a language such as Pascal or derivatives or C which allows me to define symbolic names for un-typed literals, as in

const tom = 9876; bob = 5432;
#define TOM (2*4938)
#define BOB (4*1358)

So, I would always assume that x and y() would potentially behave differently to 0.5_333_333_333_333 which I believe should inherit its type (and precision) from the dominant type of all the typed variables in the expression, assuming there are no parentheses to change precedence.

With sincere apologies for being picky, because 0.5 in your original question is exact for real(w) where w == 8 and above, they will behave the same because

0.5:real(8) == 0.5:real(16) == 0.5:real(32) == .... == 0.5:real(256);

But that only works for literals which are exact powers of the radix 2. Probably also 0.5:real(4) for what it is worth.

Given

const r = 1:uint(32) .. 20;

step.chpl:3: warning: the idxType of this range literal 1..20 with the low bound of the type uint(32) and the high bound of the type int(64) is currently uint(32). In a future release it will be switched to int(64). To switch to this new typing and turn off this warning, compile with -snewRangeLiteralType.

My rules about generic literals are obviously at odds with this proposal.

If I tell the compiler that one edge is uint(32), I would assume that the compiler believes me. But instead, I will have to tell it twice. Hmmm. I am assuming that if it changes the type that I explicitly demand, if will not do it behind my back.

If the compiler thinks I am stupid for wanting uint(32), it is more than welcome to tell me and I will weigh up its advice and may indeed follow it. But to over-rule a programmer's request for which there is likely a valid reason seems less than helpful. My 2c.

Next question: In your proposal, what would happen to the call foo(42) given these definitions:

proc foo(x: int(32)) { … }
proc foo(x: int(64)) { … }
proc foo(x: int(128)) { … }

-Brad

If you run with my idea that a lone naked literal defaults when evaluated to the smallest type that can capture the data, it is like calling

foo(42:int(8))

which means it calls the first.

If the default for a lone naked literal stays at int(64), then it is the second. There is a perfectly valid argument that

param x = 42;

returns an int(64) because 64-bit integer operations are optimal on 64-bit hardware.

But writing code that adds 42 to an 32-bit unsigned integer to find it yields a 64-bit signed integer is frustrating and is a recipe for bugs.

param t = 127:uint(32);
param xp = t + 42;

That value xp will never overflow. It is 169. I need to say

param xp = t + 42:(t.type);

to get the desired result Ugly.

Even in the const case with an unknown t, there are only 42 out of 4 billion cases of having xp overflow. Having the complier trying to protect a programmer from a one in 100 million chance of overflow by returning

t + 42

as anything other than the type of t is baffling. And there is no warning by default.

Are there any papers on the logic behind this promotion strategy?

Thanks for listening.

Indeed at some point during development of the fixes (that resolved the real problems for you), we had test failures because sin(1) started to return real(32) rather than real(64).

Perhaps it would be more consistent to choose the “smallest type that fits”, but this will lead to casts in such a sin case. In other words, it trades casts in your situation for casts in another situation. It does not remove the need for casts.

At some point in the development of the branch that fixed real behavior, I had the rule that specifying a non-default type would cause it to be preferred. So that 127:uint(32) + 42 would be a uint(32). This made some sense to me in concept but led to test failures and I wasn’t able to end up at an internally consistent solution. The main complaint I have about it is that the “default size” integer (int(64)) behaves differently than the others. What if you want to request a default-sized integer? You wouldn’t be able to.

In theory, we could have the type system separate an “untyped” or “default” literal from a typed one, so that 1:int(64) and 1 have different types. Even if we got it to work, it would be a pretty big development effort and a breaking change. For one thing, we would need to have a way to write “default size int” as a different type from “int(64)” and use that in various standard libraries. But, at least we could keep the current behavior for foo(42) and sin(1).

Lastly, the easiest solution of all would be to add integer decorators like C has (e.g. 42i64). However that wouldn’t be much use to you if the code you are considering here is generic.

There are several ways to write this that are less ugly, IMO.

First, the parentheses you have are unnecessary

param xp = t + 42:t.type;

You could store the type into a separate variable to simplify if you have multiple such expressions:

type tt = t.type;
param xp = t + 42:tt;

You can specify the type of the param variables (although there are cases where this can result in compilation error; if t is close to the maximum value of its type):

param xp: t.type = t + 42;

Lastly, if you are writing a param-returning function, you can cast on the return (or specify the return type if you are not worried about integer overflow/underflow)

param xp = t + 42;
return xp: t.type;

I agree it does not remove the need for casts.

Once you use the smallest type that fits for a generic literal, sin(1) is obvious. It is very easy to explain. Please remember that the smallest type that fits is a rule of last resort and it is imperfect for extreme/strange cases. The type of a literal should use context if that context exists. I will note that mathematically, the literal sin(1) has infinite precision and is unable to be evaluated on current hardware so I would regard it as an error in my own code. Only sin(1:uint(w)) for some w has finite precision.

I cannot explain why adding 42 to a 32-bit integer suddenly delivers a 64-bit integer without saying that Chapel has no such thing as a true literal.

The digit 1 is a literal. The typed value 1:int(64) is an anonymous param.

And yes, it is a breaking change but it is already broken for anybody trying to port existing code. Although it is not a breaking change if you have a compiler options like

--[no]default-literal-size

where ``--default-literal-size` is the default.

I would argue that Chapel is already broken because the expressions

param t = .<some 32-bit literal>
const s = <some 32-bit literal where t == s>
....
param x = t + 42;
const y = s + 42;

do not deliver the same overall result even thought == s. Overall result means more than just the numeric value. How do I explain that to novice programmers?

Getting back to readability:

param x = t + 42:t.type

is still ugly. I still cannot see why adding 42 to a 32-bit integer should deliver anything other than a 32-bit integer. In the expression

t + 42

The only type-related information provided by the programmer is the type of x so the compiler should not presume anything else.

I am not a big fan of any Chapel code that uses a declaration with a naked int or real.

I apologize for raising something that means work. But the Chapel concept of default sizes for those types and the handling of literals is going to cause more grief for int(128) and real(128) so I would argue that the issue needs to be addressed rather than later irrespective of the grief it is already causing me and others in porting existing code.

The concept of smallest type that fits is not mine.

P.S. If an integer literal has type int(256), i.e. beyond supported sizes, when it comes to using them in contextual expressions with objects of type int(w), you use a precedence of w % 256 for everything and then rely on the rules to promote the literal as appropriate. Then again, I have not worked on serious compiler internals for 24 years so I could be wrong.

Thanks again for your time Michael.

What is the practical difference between

param fred = 1234:uint(32);
inline proc fred param do return 1234:uint(32);

Just trying to better understand param.

At one stage on Github,, the following concept of an array initialization was mentioned

const x = [1.0, 2.0, 3.0]:([1..] real(32));

That is for those cases where you need a 1-based array and I do not want to count the number of elements. Extrapolating, I could also do

const x = [1.0, 2.0, 3.0]:([1:uint(32)..] real(32));

because I want the index in the following to be an unsigned integer to interface to an API without using an extra variable (and line of code).

for i in a.domain do
{
    // the index 'i' must be an unsigned integer
     ........
}

Will that now work into the future? When the compiler counts the number of elements to yield an upper bound, what type will it assign to that count? It should look at the context in my opinion and use that. But the new rules seem to infer that the context will not be used?

With a param and const declaration of the same expression potentially leading to a different net result, subsequent calls to overloaded or generic routines can be wrong leaving code containing very subtle accuracy bugs, i.e. close enough to look OK in some scenarios but disastrous in others, even giving the false impression of a more accurate (converged say) result than one might really have achieved. This then leaves time-bomb style bugs that are hard to find because the cause may happen a long way from where it is possible to detect the manifestation of that error.

One could mandate a programming discipline of explicitly typing every literal but that then complicates generic programming and destroys the expressiveness, clarity and readability of the Chapel code, a massive Chapel selling point. And mandating a discipline does not necessarily mean people follow it, especially to those newly adopting Chapel.

Thanks in advance

I've opened Should numeric literals be type-agnostic and different from `param`s in this way? · Issue #28509 · chapel-lang/chapel · GitHub to capture my understanding of this thread. Please feel free to add comments there if I have miscaptured anything, Michael and Damian.

Damian, I'd missed your last few posts due to the holidays, but answering one of them quickly on my way to dinner:

What is the practical difference between

param fred = 1234:uint(32);
inline proc fred param do return 1234:uint(32);

To me knowledge, there should be no practical difference between these two implementations, nor much of a reason to prefer one of them over the other. The only case I can come up with offhand is that if these declarations were members of a class/record rather than standalone code, the former would make the class/record type generic (since it is parameterized by a param that might take on different values in different instances), whereas the latter wouldn't (since it is simply "code").

I'll try to get back to the other question before letting much more time pass,
-Brad

Hi Damian —

Unfortunately, this doesn't work today because Chapel doesn't support domains defined by unbounded ranges at present, like {1..}. And since array types are defined using domains, by extension [1..] real; isn't a supported array type at present. As you correctly say, this is something we'd like to support in the future when an array's size can be inferred from context. I similarly frequently want this feature.

However, by bounding the range, you can get the effect you want today, as seen in this example:

const x = [1.0, 2.0, 3.0]: [1:uint(32)..3:uint(32)] real(32);

for i in x.domain do
  writeln("x[", i, ": ", i.type:string, "] = ", x[i], ": ", x[i].type:string);

produces:

x[1: uint(32)] = 1.0: real(32)
x[2: uint(32)] = 2.0: real(32)
x[3: uint(32)] = 3.0: real(32)

I wouldn't anticipate anything about this as being likely to break going forward. The hope would be to only make it more convenient to write by permitting one bound of the array's size, permitting it to be inferred.

-Brad

You had mentioned in the past that you wanted to support unbounded ranges in the future so I figured that this was a valid example for the long term. I could define

const x = [1.0, 2.0, 3.0]:real(32));

but this is zero-indexed array is painful if I wanted to drive it with an integer variable whose physical interpretation is a range of 1..3, or more precisely an unsigned 32-bit integer which went from 1 through 3. The
declaration

const x = [1.0, 2.0, 3.0]:[1..3] real(32));

is still not useful because I would still need to cast the driving variable to uint(64) to use it as an index. So, as you say, we need to do

const x = [1.0, 2.0, 3.0]:[1uint(32)..3:uint(32)] real(32));

This is ugly. In the context of integer promotion, a declaration like

const x = [1.0, 2.0, 3.0]:[1uint(32)..3] real(32));

is desirable, the literal 3 (not being given a type by the programmer) should defer to the programmer's explicit typing and accept/inherit the contextual type of the lower bound rather than assume that it knows better.

1 Like