Instruction execution sequence under optimization

Considering the code


Will the code between the braces, i.e the block containing B(), be compiled such that it is guaranteed to be completed between the finish of A() and the start of C();

It would depend on what the code was doing. If you are wondering about the ordering of some math instructions as applied to some registers, I don't think we're guaranteeing that today. But, the optimization process is supposed to produce a program that behaves the same as an unoptimized program (at least if the program was correct). There are some guarantees related to memory operations among different tasks and fences.

IIRC there might be some problems with the LLVM optimizations that we use related to movement of floating point computations and floating point exceptions. But I don't think we are really supporting floating point exceptions in Chapel at this point anyway. (I have been hoping that we can avoid adding them and instead handle the issues in a different way).

If somebody has written


If somebody has gone to the trouble to write a program with a block, why cannot they expect the compiler to do what they ask. That means, before the block is started, every action, up to and including A() must be
complete, following which the block is executed, including its last statement B(), after which statements which include C() and D() begin.

If Chapel honours parentheses for ordering within an expression, why cannot it honour braces within a sequence of code?

Sorry, hit send too soon.

How were you looking to handle floating point exceptions? With the default being non-stop processing, they are nothing like a C++ exception. Querying the status should be trivial and should not impact optimization. Nor can I see any issues in clearing the status, Raising a flag is potentially a little more complex, but not overly so.

TLDR: The Chapel MCM (Memory Consistency Model in the language spec) should answer this for you.

Long version: It depends on what you mean by "is guaranteed". A given task must observe the effects of its own memory operations (i.e., results of stores) to have occurred in program order, according to the MCM. But between tasks, the only guarantees are that the results of operations on atomic/sync/single variables done by a task are guaranteed to be seen by all other tasks to have occurred in the initiating task's program order, and that the effects of sequences of non-atomic/sync/single memory operations done by a task are guaranteed to be seen in program order with respect to atomic/sync/single operations done by that same task. But there are no guarantees about how the effects of individual non-atomic/sync/single operations within a sequence of same done by another task are seen.

So, a task that executes the sequence you showed is certainly guaranteed to see all of its own variable references occur in program order. Other tasks are not; they will only see any atomic/sync/single variable references occur in the initiating task's program order. Is that sufficient to answer your question about "is guaranteed to be completed"?

I am only interested in individual tasks. So, I think you have answered my question. Thanks. For more ...

If I query a floating point exception register (which is task-unique), I need to know that any floating point operations mentioned before this query have completed. They do not necessarily have to have been written to memory. Those floating point operations may influence that exception register as a side effect. I do not want the optimiser to delude itself that something like

z = x * y;
e = _754.testexception(_754.InvalidException); // the argument is some bit mask
     t = z * p;

can be compiled as if I had written the earlier 3 lines

e = _754testexception(_754Invalid);
z = x * y;
     const t = z * p;

because that is plain wrong. I am happy for the optimizer to tell me that it thinks I am crazy (or worse) and should reorganize my code to make it faster. But any decision about reordering my code should be left up to me. That should hold whether the proc _754testexception() is inline or external. I propose to have that routine be inline which itself inlines a C routine which contains inline assembler.

With a bit of luck, the optimiser should realise that querying the exception register will not affect what is in floating point registers and leave them where they are ready (in those registers) for subsequent use. The same will hold for clearing bits in the exception register, although that operation is normally slow, even if done inline in assembler.

For this I'll have to defer to folks in the group with more insight into our code generation.

But certainly optimizer authors have long known (generously, let's say for half a century :wink: ) that they aren't free to reorder FP mode or status related instructions with respect to FP instructions that depend upon or set those, respectively. So it would just be a question of whether Chapel needs to say anything extra to its target compilers or LLVM to let them know not to do this, or if they can figure it themselves. I'd guess the latter, but I'm an optimist.

LLVM has long had issues with tracking dependencies between
floating-point flags and operations. This is a known LLVM bug and is
being worked on; it is not something Chapel introduces. Here is one
example of a bug that has been filed along these lines.

  There are other LLVM bugs with much more detailed information in 

them, but I don't have them handy. A more thorough search of should prove interesting.

1 Like

Thanks for the reminder David.