Fused Multiply Accululate - FMA

damianmoz · March 31, 2021, 2:28am

This needs some significant discussion which can come later. But for now, I am only suggesting an
interim solution. And by the way, Discourse has said this post is similar (I guess in context) to several others, none of which mentioned anything about Fused Multiple Addition. Yes, they are related to floating point but that is a long bow to draw. If this does belong in another item, or as a Github Issue, feel free to move it (but let me know please because when it is moved, not even the original author is notified).

At the moment, an FMA can be achieved by calling the (C) maths library routine fma or *fmaf as in

inline proc _fma(x : ?Real, y : Real, z : Real) where Real == real(32)
{
    extern proc fmaf(x : real(32), y : real(32), z : real(32)) : real(32);

    return fmaf(x, y, z);
}

inline proc _fma(x : ?Real, y : Real, z : Real) where Real == real(
{
    extern proc fma(x : real(64), y : real(64), z : real(64)) : real(64);

    return fma(x, y, z);
}

Directly calling those routines are slow.

Until there is a discussion about how Chapel should support FMA in its arithmetic without calling those routines, addressing the performance issues in the above is moot.

In the meantime, can we get a compiler flag which will force the back-end to do what -mfma achieves in many C compilers and map any explicit call to what appears to be the maths library rouitine fma (or fmaf) to a macro which will directly produce a Fused Multiply machine instruction. This may need to be totally separate from --fast because it has to work when --ieee-float is invoked with --fast.

By way of giving people something to exercise the little grey cells in some quiet moments of reflection, one might consider the following which comes from the way Rust approaches this topic.

Code such as

(x * y)

should ALWAYS be done as a conventional multiplication with the result rounded to the precision of the x if its precision or exceeds matches that of y, or that oif y otherwise.

Similarly, code such as

(x * y + z)

or even

z += x * y;

is always evaluated (using the above proc definitions) as the respective equivalents of

_fma(x, y, z)

or

z = fma(x, y, z);

On the other hand, how code like the following would be interpreted is anybody's guess

(x * y + p * q)

That said, more precise code like

(x * y + (p * q))

would of course follow those rules just mentioned.

And of-course, the compiler can be forced to not emit FMA instructions.

Applications are many-fold.

But for now, only the compiler is remotely mission critical. The rest needs feedback and discussion.

mppf · April 5, 2021, 3:14pm

We are working towards moving to the LLVM backend by default. When we do that, we will have the ability to have finer-grained control over the things that --no-ieee-float / --ieee-float control today.

I think it's interesting to consider having the compiler always emit an LLVM FMA intrinsic for certain patterns, in addition. I do not expect that this would be particularly hard to implement (it would amount to an adjustment to how we code-generate such a nested call or could be expressed as an optimization in the AST within the Chapel compiler). However I do not know if that would interfere with other LLVM optimizations.

damianmoz · April 6, 2021, 6:51am

We are working towards moving to the LLVM backend by default. When we do
that, we will have the ability to have finer-grained control over the
things that --no-ieee-float / --ieee-float control today.

I simply put down some words so that the topic and content is on the TODO
list.

No rush. I can use an explicit call for now. But removing that explicit
call will do wonders for the readability (and hence maintainability) of
code.

I think it's interesting to consider having the compiler always emit an
LLVM FMA intrinsic for certain patterns.

The trick is being able to guarantee what happens for certain. Something
rust achieves but C/C++ does not.

However I do not know if that would interfere with other LLVM
optimizations.

Always an interesting topic.

Thanks - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

Topic		Replies	Views
Optimizer Arithmetic Operations Order Users	6	90	February 5, 2025
Does `fma` belong in AutoMath Users	4	22	December 12, 2025
How does Chapel do its complex arithmetic Developers	5	166	February 16, 2024
List of pragmas Users	21	488	January 29, 2022
Optimal handling of max and min of real numbers Users	2	140	January 4, 2024

Fused Multiply Accululate - FMA

Related topics