[Chapel Merge] Improve automatic local access optimization

Branch: refs/heads/master
Revision: bd1c1a6
Author: e-kayrakli
Log Message:

Merge pull request #16422 from e-kayrakli/auto-local-access-aligned-dom2

Improve automatic local access optimization

This PR improves the coverage of the automatic local access optimization in two
different ways:

  1. We start to allow the dynamic check to pass for arrays whose domains are not
    the same as the loop domain, but well-aligned.

    This will allow the following case to be optimized:

    var d = newBlockDom(...);
    var a: [d] int;
    
    var dInner = d.expand(-1);
    
    forall i in dInner { ... a[i] ... }
    

    1.a. This case also includes where we iterate over a DR domain that is fully
    included in the local subdomain of a distributed array

    This will allow the following case to be optimized:

    var d = newBlockDom(...);
    var a: [d] int;
    coforall l in Locales do on l {
      var localDom = a.localSubdomain();
      forall i in localDom { ... a[i] ... }
    }
    
  2. We start to analyze foralls that have an arbitrary call expression as the
    iterand.

    This will allow similar cases to the ones above (and more) to be optimized
    even if we have a domain-generating call as the iterand:

    var d = newBlockDom(...);
    var a: [d] int;
    
    var dInner = d.expand(-1);
    
    forall i in d.expand(-1) { ... a[i] ... }
    
    
    var d = newBlockDom(...);
    var a: [d] int;
    coforall l in Locales do on l {
      forall i in a.localSubdomain() { ... a[i] ... }
    }
    

Note that, all the new cases covered by this PR are covered dynamically. This
means that, I expect this PR to increase loop cloning noticably.

Limitation:

We don’t do any optimization for:

forall i in zip((...someTuple)) { ... a[i] ...}

I don’t think there is anything fundamental preventing that, but I am leaving that
as a future step because of lack of motivation, and non-trivial implementation.

I suspect we can pass tuples to some of the module helpers, which could pluck
the first item from the tuple. And then, the compiler can grab the argument to
PRIM_TUPLE_EXPAND and pass that to those helpers.

Implementation Details

  • Adds iterCall and iterCallTmp to ForallOptimizationInfo. The former is
    not used for the optimization but only to keep track just in case we need
    that.
  • Adds earlyNormalizeForallIterand to
    compiler/optimizations/preNormalizeOptimizations. This function is
    implemented in normalize.cpp and it is just a version of
    insertCallTempsWithStmt that also returns the added call temp.
  • Moves doPreNormalizeOptimizations after insertModuleInit. We need module
    initializers to be able to normalize foralls’ iterands if they are call
    expressions.
  • Adds new checks to chpl__dynamicAutoLocalAccess to cover for the cases as
    mentioned above
  • Trivial: adds the missing override keyword to isDefaultRectangular()
  • Trivial: changes the reporting flag to also add the --no- version
  • Trivial: renames --no-auto-local-access-dynamic to
    --no-dynamic-auto-local-access (I think this reads better, but I can change
    it back)
  • Trivial: Few cosmetic changes in how we report the optimization in the
    compiler (I hope to revise this even more in a separate PR, but it is not a
    must)
  • Adds and updates bunch of tests

[Reviewed by @ronawho and @vasslitvinov]

Testing:

  • [x] full standard
  • [x] full gasnet

Modified Files:
A test/optimizations/autoLocalAccess/differentButAlignedDoms.chpl
A test/optimizations/autoLocalAccess/differentButAlignedDoms.good
A test/optimizations/autoLocalAccess/flags-full.good
A test/optimizations/autoLocalAccess/flags-none.good
A test/optimizations/autoLocalAccess/flags-staticonly.good
A test/optimizations/autoLocalAccess/flags.chpl
A test/optimizations/autoLocalAccess/flags.compopts
A test/optimizations/autoLocalAccess/flags.prediff
A test/optimizations/autoLocalAccess/preventMultiCall.chpl
A test/optimizations/autoLocalAccess/preventMultiCall.good
A test/optimizations/autoLocalAccess/preventMultiCallIter.chpl
A test/optimizations/autoLocalAccess/preventMultiCallIter.good
A test/optimizations/autoLocalAccess/zipper/differentButAlignedDoms.chpl
A test/optimizations/autoLocalAccess/zipper/differentButAlignedDoms.good
A test/optimizations/autoLocalAccess/zipper/preventMultiCall.chpl
A test/optimizations/autoLocalAccess/zipper/preventMultiCall.good
A test/optimizations/autoLocalAccess/zipper/preventMultiCallIter.chpl
A test/optimizations/autoLocalAccess/zipper/preventMultiCallIter.good
M compiler/include/ForallStmt.h
M compiler/include/driver.h
M compiler/include/preNormalizeOptimizations.h
M compiler/main/driver.cpp
M compiler/optimizations/preNormalizeOptimizations.cpp
M compiler/passes/normalize.cpp
M man/chpl.rst
M modules/internal/ChapelAutoLocalAccess.chpl
M modules/internal/DefaultRectangular.chpl
M test/compflags/bradc/help/userhelp.good
M test/optimizations/autoLocalAccess/allDynamicsFailStatic.good
M test/optimizations/autoLocalAccess/commaDecl.good
M test/optimizations/autoLocalAccess/copyInitDeclaration.good
M test/optimizations/autoLocalAccess/dotDomDeclaration.good
M test/optimizations/autoLocalAccess/dynamicCheckInGenericFunction.good
M test/optimizations/autoLocalAccess/dynamicChecks.good
M test/optimizations/autoLocalAccess/elemAsIndex.good
M test/optimizations/autoLocalAccess/functionArgs.good
M test/optimizations/autoLocalAccess/interveningForallOrOn.chpl
M test/optimizations/autoLocalAccess/interveningForallOrOn.good
M test/optimizations/autoLocalAccess/multipleAccessDynamic.good
M test/optimizations/autoLocalAccess/multipleAccessStatic.good
M test/optimizations/autoLocalAccess/nonDomainIter.good
M test/optimizations/autoLocalAccess/oneStaticFailOtherDynamicSuccess.good
M test/optimizations/autoLocalAccess/regularCommaDeclaration.good
M test/optimizations/autoLocalAccess/regularDeclaration.good
M test/optimizations/autoLocalAccess/regularDeclaration2D.good
M test/optimizations/autoLocalAccess/staticSuccessDynamicFail.good
M test/optimizations/autoLocalAccess/unalignedSameDist.good
M test/optimizations/autoLocalAccess/withInitializerCall.good
M test/optimizations/autoLocalAccess/zipper/allDynamicsFailStatic.good
M test/optimizations/autoLocalAccess/zipper/commaDecl.good
M test/optimizations/autoLocalAccess/zipper/copyInitDeclaration.good
M test/optimizations/autoLocalAccess/zipper/dotDomDeclaration.good
M test/optimizations/autoLocalAccess/zipper/dynamicCheckInGenericFunction.good
M test/optimizations/autoLocalAccess/zipper/dynamicChecks.good
M test/optimizations/autoLocalAccess/zipper/elemAsIndex.good
M test/optimizations/autoLocalAccess/zipper/functionArgs.good
M test/optimizations/autoLocalAccess/zipper/interveningForallOrOn.chpl
M test/optimizations/autoLocalAccess/zipper/interveningForallOrOn.good
M test/optimizations/autoLocalAccess/zipper/multipleAccessDynamic.good
M test/optimizations/autoLocalAccess/zipper/multipleAccessStatic.good
M test/optimizations/autoLocalAccess/zipper/nonDomainIter.good
M test/optimizations/autoLocalAccess/zipper/oneStaticFailOtherDynamicSuccess.good
M test/optimizations/autoLocalAccess/zipper/regularCommaDeclaration.good
M test/optimizations/autoLocalAccess/zipper/regularDeclaration.good
M test/optimizations/autoLocalAccess/zipper/regularDeclaration2D.good
M test/optimizations/autoLocalAccess/zipper/staticSuccessDynamicFail.good
M test/optimizations/autoLocalAccess/zipper/withInitializerCall.good
M util/chpl-completion.bash

Compare: https://github.com/chapel-lang/chapel/compare/475cd47c6bec...bd1c1a66e417