External Issue: Compiler Built-in Mathematical routines - Long Term - Not Urgent

chapelu · November 16, 2022, 2:20am

21043, "damianmoz", "Compiler Built-in Mathematical routines - Long Term - Not Urgent", "2022-11-16T02:18:46Z"

Compiler Built-in Mathematical routines - Long Term - Not Urgent

opened 02:18AM - 16 Nov 22 UTC

Introduced not to make more work in the short term but to raise the issues invol…ved so that they are considered in decisions made going forward. Modern C compilers (try and) treat several fundamental mathematical routines as 'built-in'. These are those with the functionality (and a 64-bit draft C23 name) as follows: ``` FMA - fused multiply and add (fma) ABS - absolute value of a floating point number (fabs) square root (sqrt) truncate towards zero (trunc) round to nearest with ties away from zero (round) round to nearest with ties to even (roundeven) round towards positive infinity (ceil) round towards negative infinity (floor) round according to the current rounding direction (rint) minimum of two floating point numbers of one or more flavours (fmin) maximum of two floating point numbers of one or more flavours (fmax) transfer the sign of one floating point number to another (copysign) get the negative bit of a floating point number (signbit) ``` These compilers implement such routines with a subroutine call using either a special primitive as would likely be the case with ABS and FMA, or the far simpler expedient of using a header file containing an **inline** C routine with (hopefully minimalist) embedded assembler, something really feasible only with more recent versions of the C language standard. There are other routines that arguably could also be in that list: ``` split a floating point number into an exponent and a signed factor ramp function (fdim) or some other sort of Heaviside function scale a floating point number by the radix raised to an integral power round to nearest with ties to odd inverse square root (rsqrt) other flavours of the minimum of two floating point numbers other flavours of the maximum of two floating point numbers ``` A flavour of the first of these is the C routine `frexp`, a routine that in the opinion of some does not fit modern needs, not least because it reflects floating point numbers of the 1970s!! The functionality of the last three is recommended by the latest IEEE 754 standard and appear in drafts of the next C standard. That supplementary list is not exhaustive and deliberately does not include the routines that work with floating point exceptions and other aspects of the floating point environment. They are a whole new ball game, especially in the context of LLVM. Long term, does Chapel try and simply leverage what the C standard provides, which is dictated by what is standardized by C17 or C23 or does it exploit its own more powerful (and arguably simpler) features and handle builtins itself???? Sometimes the quality of the routine that you get in a C library is sub-optimal and it would be good to avoid this. For example, the **glibc** version of the scaling noted above is arguably nearly 3 times slower than it needs to be. There is at least one bigger issue here. Chapel is yet to address fused multiply/additions, something that in my humble opinion only the Rust language has done rigorously and consistently and thoroughly. So that needs to be considered here. Some ideas on this are discussed in #11335. Food for thought!! And discussion. Not sure if this needs multiple issues. Its content will overlap (to some extent) other issues but the focus here is how to provide the aforementioned functionality such that any subroutine call overhead is avoided and optimal performance is achieved (at the expense of code).

Introduced not to make more work in the short term but to raise the issues involved so that they are considered in decisions made going forward.

Modern C compilers (try and) treat several fundamental mathematical routines as 'built-in'. These are those with the functionality (and a 64-bit draft C23 name) as follows:

	FMA - fused multiply and add (fma)
	ABS - absolute value of a floating point number (fabs)

	square root (sqrt)

	truncate towards zero (trunc)
	round to nearest with ties away from zero (round)
	round to nearest with ties to even (roundeven)
	round towards positive infinity (ceil)
	round towards negative infinity (floor)
	round according to the current rounding direction (rint)

	minimum of two floating point numbers of one or more flavours (fmin)
	maximum of two floating point numbers of one or more flavours (fmax)

	transfer the sign of one floating point number to another (copysign)
	get the negative bit of a floating point number (signbit)

These compilers implement such routines with a subroutine call using either a special primitive as would likely be the case with ABS and FMA, or the far simpler expedient of using a header file containing an inline C routine with (hopefully minimalist) embedded assembler, something really feasible only with more recent versions of the C language standard.

There are other routines that arguably could also be in that list:

	split a floating point number into an exponent and a signed factor

	ramp function (fdim) or some other sort of Heaviside function

	scale a floating point number by the radix raised to an integral power

        round to nearest with ties to odd

	inverse square root (rsqrt)

	other flavours of the minimum of two floating point numbers
	other flavours of the maximum of two floating point numbers

A flavour of the first of these is the C routine frexp, a routine that in the opinion of some does not fit modern needs, not least because it reflects floating point numbers of the 1970s!! The functionality of the last three is recommended by the latest IEEE 754 standard and appear in drafts of the next C standard.

That supplementary list is not exhaustive and deliberately does not include the routines that work with floating point exceptions and other aspects of the floating point environment. They are a whole new ball game, especially in the context of LLVM.

Long term, does Chapel try and simply leverage what the C standard provides, which is dictated by what is standardized by C17 or C23 or does it exploit its own more powerful (and arguably simpler) features and handle builtins itself???? Sometimes the quality of the routine that you get in a C library is sub-optimal and it would be good to avoid this. For example, the glibc version of the scaling noted above is arguably nearly 3 times slower than it needs to be.

There is at least one bigger issue here. Chapel is yet to address fused multiply/additions, something that in my humble opinion only the Rust language has done rigorously and consistently and thoroughly. So that needs to be considered here. Some ideas on this are discussed in #11335.

Food for thought!! And discussion. Not sure if this needs multiple issues. Its content will overlap (to some extent) other issues but the focus here is how to provide the aforementioned functionality such that any subroutine call overhead is avoided and optimal performance is achieved (at the expense of code).

Topic		Replies	Views
Fused Multiply Accululate - FMA Users	2	228	April 6, 2021
AutoMath : documentation(1.28) and routines therein Users	19	346	October 13, 2022
Why not a hypot(x,y) in Math.chpl? Users	10	124	October 28, 2024
Complex Square Root - Chapel Performance Users	15	171	May 5, 2025
List of pragmas Users	21	534	January 29, 2022

External Issue: Compiler Built-in Mathematical routines - Long Term - Not Urgent

Related topics