Integer Promotion Weirdness

Consider the following:

The compile time routine bias() returns an unsigned iinteger. It is passed a real(?w) type

        const z = 25:uint(32);
        const y = z + 1;
        const b = bias(real(32));
        const c = 2 * b + 1;
        const d = 2 * bias(real(32)) + 1;

        writeln(z.type:string);
        writeln(y.type:string);
        writeln(b.type:string);
        writeln(c.type:string);
        writeln(d.type:string);

Look at x and y. Adding 1 to an unsigned integer results in an unsigned integer. Expected.

Saving its result in b and then doubling it and adding 1 yields an unsigned integer. Expected.

But dong that computation a single expression yields a signed integer of double the size of the original. How does this happen?

I thought maybe somebody was trying to be smarter than they should be so I changed those const identifiers to param identifiers. But no, it is worse. It is a mess. I am totally lost.

Hi Damian —

I suspect we'll need you to share a definition of bias() in order to help with this. Taking a guess at how you may have defined it with Chapel 2.0, I'm seeing only consistent/expected results (see link below), so suspect I've guessed at what you're doing incorrectly.

Run – Attempt This Online

-Brad

Sorry. I figured it did not matter because I am doing exactly the same computations.

proc bias(type T : real(32)) param do return 127:uint(32);
proc bias(type T : real(64)) param do return 1023:uint(64);
proc bias(x : real(?w)) param do return bias(real(w));

This came out of the earlier post "Generic Floating Point Definition Style".

Yes. It is Chapel 2.0.

Hi Damian -

I agree that this is an odd case. I can explain why it is happening.

I found that passing both of the flags --warn-param-implicit-numeric-conversions --warn-implicit-numeric-conversions does give a warning in this case (although I'm not sure I can justify today why both of these flags are required to warn here).

Anyway, at the root of the issue is that Chapel's implicit conversions rules are intended to make literals more flexible. Since literals are param (and params should behave like literals) that means that the flexibility applies here to the expression bias(real(32)) since that returns param.

In particular, if you have someNonParamValue + someParamValue then it will try to prefer the type of someNonParamValue. E.g., for var myInt32: int(32); myInt32 + 1, the type of 1 is int(64) but the result of that expression will be int(32) because the param value 1 can be represented as int(32) and the type of the param is considered less important here than the type of the non-param.

In contrast, when we have someParamValue + someOtherParamValue the two expressions being added are considered equally for the result type. E.g. with param myParamInt32: int(32) = 0; var myParamInt64: int(64) = 0; myParamInt32 + myParamInt64; the last expression will have type int(64). That's the same as a non-param case like var myInt32: int(32); var myInt64: int(64); myInt32 + myInt64;. In both cases, we choose the larger type. (We have to do that in the non-param case, and we do it in the param case to be consistent with the non-param case & keep the rules sensible).

What to do about it? All I can think of to improve the situation at this point is to potentially bump up the priority of opting out of implicit numeric conversions and/or improving the warning for implicit numeric conversions.

1 Like

Thanks. I still do not really understand the rules. And it breaks lots of my old code.

Damian —

Quick question: Do your uses of bias() require it to be a param? If not, you could just remove the param return intent and get the behavior you want, I think.

-Brad

The bias() is known at compile time, so it should reflect reality and be a param.

Besides, Lots of things (for which I want no run-time overhead) depend on the bias() so it has to be a param,

What are the rules which seem to yield an int(64) from an expression which contains only identifiers and literals that I would consider int(32). Or is it that a literal integral type is given a type of int(64) irrespective of context. Is it also that a literal non-imaginary floating point type is given a type of real(64).

Yes, it's that the literal 1 is an int(64) regardless of context; it can implicitly convert into int(32), but that doesn't help with something like 1:int(32) + 2 (which results in int(64)). Of course, you can cast the other value or the result of the whole expression. And, this is very different from C, where integer literals are typically 32-bit ints.

Similarly, 1.25 has type real(64) regardless of context.

It's just that these literals are param, and params have more flexibility to convert into smaller numbers (real(32), int(8), etc) than regular variables.

It's important that 1 have a type so that we (and the compiler) know what var x = 1 means (i.e. that x will be an int(64)).

Some ideas dragged out of my past rewritten/translated in a Chapel context. My memory may be failing or I could have mistranslated or even have lost my marbles.

If the programmer has not thought enough about the structure of a mixed mode
expression, the last thing a compiler should do is provide a (likely) broken
way out of their laziness.

Assumptions of mixed mode arithmetic:

  • Rules/Definitions/Mandates are to be orthogonal and easy to remember
    -- they should be kept to a minimum
  • The compiler can question the programmer's intelligence at any time
    -- with lots of warnings (but that is all)
  • It is not the compiler's role to be a numerical analyst
    -- there are tools to help with that, e.g. Herbie
  • The compiler will have the following compiler switches:
    ---- width of integral type of last-resort, e.g. --itolr-width=32
    ---- width of floating-point type of last-resort, e.g. --ftolr-width=64
    -- these can be mandated within code by a pragma(t)
    -- the active type is that at the compiler command line or within user code
    ---- when the active type is within user code, a command line value is an error
    -- if the active last-resort type != that within an include file => ERROR

There is/are the concepts of:

  • explicit and implicit types
    -- the width of an implicit type is NEVER known
    ---- this will handle larger and larger reals and integral types
  • a raw expression is one which
    • is made up of a mix of identifiers and literals
    • has no parentheses (i.e. precedence is determined by operators only)
  • a label, name and identifier are the same (in this context)

Definitions/Mandates

  • a param, const or var has an explicit type
  • a proc which returns a value has an explicit type
    ---- even if that type is given to it (implicitly) at the RETURN statement
  • a literal has an implicit type
    ---- real(w) if it contains a binary or decimal point or exponent
    ---- int(w) if it contains neither binary nor decimal point nor exponent
    ---- w is an unknown quantity
    ---- a literal NEVER has an explicit type
  • an anonymous param is a literal which has been coerced to an explicit type
    ---- e.g. 1.2345678987654321:real(32)
    ---- it is treated as a param (which has an explicit type)
  • an explicit real(p) type dominates an explicit real(q) if p > q
  • an explicit int(p) type dominates an explicit int(q) if p > q
  • an explicit uint(p) type dominates an explicit uint(q) if p > q
  • an explicit real(f) type dominates an explicit int(g) for any f and g
  • an explicit int(f) type dominates an explicit uint(g) for any f and g
  • an explicit type dominates an implicit type irrespective of bit-width
  • a thing is a literal|param|const|var|proc (the last four have a label)

There is only one rule:

The type of the result of an expression of things of
different numeric types is the dominant numeric type

This has a simplification:

The type of the result of an expression of things of
the same numeric type T is the numeric type T

Note:
-- the compiler is free to complain loudly if it objects to the above
-- the compiler cannot produce code that over-rides any of the above

Note that in the event that a raw expression is assigned to a pre-typed identifier, the type of that identifier is NOT the dominant type of the expression.

In evaluating a raw expression (no parentheses), the dominant type may change left to right throughout an expression:

  • an identifier is coerced to the explicit type dominant at the point
    in the expression where it occurs
  • literals take (or assume or are coerced to) the dominant explicit type
    at the point of their appearance in the expression evaluation. It is a
    compile time error if the dominant explicit type (at some point in the
    expression) is integral where there appears a floating point literal,
    i.e. something which has an implicit type.

Should an (un-typed/un-coerced) expression be made up of

a) integral literals only, it is evaluated as if
- t.b.a.
b) floating point literals only, it is evaluated as if
- t.b.a.
c) a mix of floating point and integral literals, it is evaluated as if
- t.b.a.

t.b.a. = o.t.i. (open to interpretation) = o.f.d. (open for discussion)

There is a mandate that in the extreme says that a real(16) dominates an int(128). Anybody who exploits that, or appears to do so, is not very smart. In this event, the compiler should be complaining bitterly. Then again, anybody using a real(16) will by definition be paying a lot of attention to accuracy so it is impossible that such a problem will arise in practice. If any confusion exists, then attention is drawn to the second sentence of this paragraph.

My 2c.

The definitions/mandates have implications for handling and infinity and the various NaNs. But nothing dramatic.

There are deliberate contentious issues in some of my words above but that is more to stimulate discussion rather than to be controversial.

Sorry Damian, I've been meaning to catch back up on this discussion but
drew the short straw on being in charge of testing and we had a lot of
noise over the weekend. Still a bit underwater myself but hoping to get
back on top of things by the end of the week.

Thanks,
Lydia

No rush. I noticed the problem in the performance testing I was doing for our ChapelCon 24 and I simply put some words to paper (in a digital/virtual sense) while they were fresh in my mind.

I revisited this as it is about to cause me even more grief than it already has. I still do not understand the reasoning behind having what looks like a generic literal integer to always have an explicit non-generic type, i.e.

a literal 1 is an int(64) regardless of context;

It complicates generic programming as generic integral expressions of identifiers and literals are ugly hard to read because literals always need to be explicitly generically cast to avoid the Chapel compiler doing naughty things to a programmer. By an integral expression I mean one comprised only of integer symbols or literals, not Newton's (and Leibnitz's) continuous analog of a sum.

Even with regards to

It's important that 1 have a type so that
we (and the compiler) know what var x = 1
means (i.e. that x will be an int(64)).

You only need to choose a type at assignment time, i.e. across the assignment operator, not within an expression, and even then it only needs an explicit type when the value of the expression is not known at compile time.

I looked at:

And, this is very different from C, where
integer literals are typically 32-bit ints

That is not my interpretation and I did reread the C standard a few times. If say x is a short integer, then the expression x + 1 in C is treated as a short integer expression for all intents and purposes, or at least by the compilers and static analysis tools I use. So the integer literal is really contextual in C.

For truly generic code, a literal within an expression should inherit its type from any programmer-specified identifiers within the same expression, otherwise the compiler is making a decision contradicting the programmer.

The problem got fixed for real(w) types long ago by Michael although he uses more sophistication for the task than I thought would be needed. The integral types problem is complicated by the signed and unsigned issue.I know what a fix might be but it might not integrate optimally within the compiler and there may be issues I have not considered.

Looking at

While I find the whole convert into smaller numbers odd, I thought it was const s which do it better, e.g.

param bp = bias(real(32)); // bp has type uint(32)
const hp = bp + 1; // bp has type int(64)
const bc = bioas(real(32)); // bc has type uint(32)
const hc = bc + 1; // bc has type uint(32)

1 in C really is an int . Here is a C++ program showing that 1 + myshort is also an int in C++. I’m showing a C++ program just because C++ has a convenient way of printing the type of a variable. I am confident that C has the same “integer promotion” rules.

#include <iostream>
#include <typeinfo>

int main() {
  short ss = -1000;
  unsigned short us = 1000;
  int si = -1000000;
  unsigned int ui = 1000000;
  long sl = -10000000000;
  unsigned long ul = 10000000000;
  std::cout << "signed short: " << typeid(ss).name() << "\n";
  std::cout << "unsigned short: " << typeid(us).name() << "\n";
  std::cout << "signed int: " << typeid(si).name() << "\n";
  std::cout << "unsigned int: " << typeid(ui).name() << "\n";
  std::cout << "signed long: " << typeid(sl).name() << "\n";
  std::cout << "unsigned long: " << typeid(ul).name() << "\n";


  auto ssp1 = ss + 1;
  std::cout << "signed short + 1: " << typeid(ssp1).name() << "\n";

  short test_short = sl; 
  std::cout << "test_short is " << test_short << "\n";
}

If you run this program g++ -Wall a.cc && ./a.out, you will see

...
signed int: i
...
signed short + 1: i

Indicating that mySignedShort + 1 is an int.

The reason you don’t notice this in C++ is that a C or C++ compiler is perfectly happy to convert a larger type into a smaller one (losing information in the process). That is why the last line prints out test_short is 7168 (on my system anyway).

Going to your example:

param bp = bias(real(32)); // bp has type uint(32)
const hp = bp + 1; // bp has type int(64)
const bc = bioas(real(32)); // bc has type uint(32)
const hc = bc + 1; // bc has type uint(32)

Here it’s again the param-only computation. bp is uint(32) and param. 1 is int(64) and param. So, the compiler sees bp + 1 and infers the type to be int(64) to match the general “integer promotion” rules for consts, since both are param. Since you are storing it in a const you are forcing the compiler to choose its type. As we have discussed earlier, such a param can go into a smaller type if it fits, so something like const zz:uint(32) = bp + 1 compiles without error (at least in my example; technically if pb+1 doesn’t fit into a uint(32) it wouldn’t).

I am sure there are cases when the system doesn’t do what you want. Unfortunately with this kind of thing, changes to the rules are likely to have unexpected knock-on effects. (And I say that based on experience.)

If you pass the C expression

ss + 1

as a parameter to a C routine expecting short, it is treated as a short expression. It only looses information if the expression evaluates to something larger than a short which my (hopefully) thorough testing and judicious use of asserts ensures never happens. But C is not the topic of conversation

I would agree.

That said, an integer literal to somebody reading algebra is generic, just as a floating point literal is generic and I am trying to write generic code that is both readable and reflects how the algebra is written. Because the existing rule to treat an integer literal as an int(64) is not generic, to support those who want generic code, Chapel needs a compiler option that treats an integer literal as generic, effectively int(0), although you do need to capture the value of that literal temporarily during the compilation process as say int(256), i.e. beyond the accuracy of the hardware, until it needs to be evaluated.That way int(0) then never interferes with type promotion rules and remains generic. This approach also needs none of the internal shortening rules which must be a nightmare. The extra handling occurs where the literal does not know what type to inherit at some point, e.g. where it is the only element in the RHS of an assignment, or where there are only literals in a parenthesised expression, or in a range with no other programmer defined identifier.

param b = 1023;
const x = 1;
const fred = .... blah-blah.... +(2*5)*x ... blah-blah ...
const r = 1..100;
var t : [1..10, 1..10] real(32);

Then you evaluate the integer literal (or the literal expression) to appropriate accuracy. For the assignment case, that size could be deduced as the minimum size needed for the value in question. For the parenthesised expression, it would be the precision used to temporarily capture int(0) literals. For a range, it should default to be the best type to use for indexing operations. For backward compatibility, it should be int(64) although a compilation option should allow something which is truly generic.

This allows for a truly generic expression of some expression

 i + 2

Treating a literal integers as int(0) should also work for where/when Chapel needs to support 128-bit integers. The underlying logic should also handle floating point literals as well, both real(16) and real(128) and beyond.

Whether having truly generic literals breaks anything else I do not know. But the way Chapel currently maps a generic literal integer to a non-generic int(64) certainly breaks pretty well every piece of non 64-bit Fortran or C or C++ or Java code I am, or anyone else is, trying to port to Chapel unless that old code only worked with 64-bit data. And nearly 50% of existing HPC code is Fortran and some sources estimate that another 20% of C or C++ HPC code has been translated from, or written like, Fortran. The number of people in the boat who are affected by that non-general implied type is non-trivial. From a personal perspective, it is costing us huge amounts of time and results in frustrated programmers and ugly code because every integer literal needs to rewritten to look like

1:int(w)

which trashes one of Chapel's features, its readability and clarity of expression. And frustrated programmers are very much less productive programmers.

This approach also means that either of

param x = t + 1;
const x = t + 1;

will always yield the same result for some param t, which is not the case now. That is a real nightmare.

Thanks in advance.