Branch: refs/heads/master
Revision: bd1c1a6
Author: e-kayrakli
Log Message:
Merge pull request #16422 from e-kayrakli/auto-local-access-aligned-dom2
Improve automatic local access optimization
This PR improves the coverage of the automatic local access optimization in two
different ways:
-
We start to allow the dynamic check to pass for arrays whose domains are not
the same as the loop domain, but well-aligned.This will allow the following case to be optimized:
var d = newBlockDom(...); var a: [d] int; var dInner = d.expand(-1); forall i in dInner { ... a[i] ... }
1.a. This case also includes where we iterate over a DR domain that is fully
included in the local subdomain of a distributed arrayThis will allow the following case to be optimized:
var d = newBlockDom(...); var a: [d] int; coforall l in Locales do on l { var localDom = a.localSubdomain(); forall i in localDom { ... a[i] ... } }
-
We start to analyze foralls that have an arbitrary call expression as the
iterand.This will allow similar cases to the ones above (and more) to be optimized
even if we have a domain-generating call as the iterand:var d = newBlockDom(...); var a: [d] int; var dInner = d.expand(-1); forall i in d.expand(-1) { ... a[i] ... }
var d = newBlockDom(...); var a: [d] int; coforall l in Locales do on l { forall i in a.localSubdomain() { ... a[i] ... } }
Note that, all the new cases covered by this PR are covered dynamically. This
means that, I expect this PR to increase loop cloning noticably.
Limitation:
We don’t do any optimization for:
forall i in zip((...someTuple)) { ... a[i] ...}
I don’t think there is anything fundamental preventing that, but I am leaving that
as a future step because of lack of motivation, and non-trivial implementation.
I suspect we can pass tuples to some of the module helpers, which could pluck
the first item from the tuple. And then, the compiler can grab the argument to
PRIM_TUPLE_EXPAND and pass that to those helpers.
Implementation Details
- Adds
iterCall
anditerCallTmp
toForallOptimizationInfo
. The former is
not used for the optimization but only to keep track just in case we need
that. - Adds
earlyNormalizeForallIterand
to
compiler/optimizations/preNormalizeOptimizations
. This function is
implemented innormalize.cpp
and it is just a version of
insertCallTempsWithStmt
that also returns the added call temp. - Moves
doPreNormalizeOptimizations
afterinsertModuleInit
. We need module
initializers to be able to normalize foralls’ iterands if they are call
expressions. - Adds new checks to
chpl__dynamicAutoLocalAccess
to cover for the cases as
mentioned above - Trivial: adds the missing
override
keyword toisDefaultRectangular()
- Trivial: changes the reporting flag to also add the
--no-
version - Trivial: renames
--no-auto-local-access-dynamic
to
--no-dynamic-auto-local-access
(I think this reads better, but I can change
it back) - Trivial: Few cosmetic changes in how we report the optimization in the
compiler (I hope to revise this even more in a separate PR, but it is not a
must) - Adds and updates bunch of tests
[Reviewed by @ronawho and @vasslitvinov]
Testing:
- [x] full standard
- [x] full gasnet
Modified Files:
A test/optimizations/autoLocalAccess/differentButAlignedDoms.chpl
A test/optimizations/autoLocalAccess/differentButAlignedDoms.good
A test/optimizations/autoLocalAccess/flags-full.good
A test/optimizations/autoLocalAccess/flags-none.good
A test/optimizations/autoLocalAccess/flags-staticonly.good
A test/optimizations/autoLocalAccess/flags.chpl
A test/optimizations/autoLocalAccess/flags.compopts
A test/optimizations/autoLocalAccess/flags.prediff
A test/optimizations/autoLocalAccess/preventMultiCall.chpl
A test/optimizations/autoLocalAccess/preventMultiCall.good
A test/optimizations/autoLocalAccess/preventMultiCallIter.chpl
A test/optimizations/autoLocalAccess/preventMultiCallIter.good
A test/optimizations/autoLocalAccess/zipper/differentButAlignedDoms.chpl
A test/optimizations/autoLocalAccess/zipper/differentButAlignedDoms.good
A test/optimizations/autoLocalAccess/zipper/preventMultiCall.chpl
A test/optimizations/autoLocalAccess/zipper/preventMultiCall.good
A test/optimizations/autoLocalAccess/zipper/preventMultiCallIter.chpl
A test/optimizations/autoLocalAccess/zipper/preventMultiCallIter.good
M compiler/include/ForallStmt.h
M compiler/include/driver.h
M compiler/include/preNormalizeOptimizations.h
M compiler/main/driver.cpp
M compiler/optimizations/preNormalizeOptimizations.cpp
M compiler/passes/normalize.cpp
M man/chpl.rst
M modules/internal/ChapelAutoLocalAccess.chpl
M modules/internal/DefaultRectangular.chpl
M test/compflags/bradc/help/userhelp.good
M test/optimizations/autoLocalAccess/allDynamicsFailStatic.good
M test/optimizations/autoLocalAccess/commaDecl.good
M test/optimizations/autoLocalAccess/copyInitDeclaration.good
M test/optimizations/autoLocalAccess/dotDomDeclaration.good
M test/optimizations/autoLocalAccess/dynamicCheckInGenericFunction.good
M test/optimizations/autoLocalAccess/dynamicChecks.good
M test/optimizations/autoLocalAccess/elemAsIndex.good
M test/optimizations/autoLocalAccess/functionArgs.good
M test/optimizations/autoLocalAccess/interveningForallOrOn.chpl
M test/optimizations/autoLocalAccess/interveningForallOrOn.good
M test/optimizations/autoLocalAccess/multipleAccessDynamic.good
M test/optimizations/autoLocalAccess/multipleAccessStatic.good
M test/optimizations/autoLocalAccess/nonDomainIter.good
M test/optimizations/autoLocalAccess/oneStaticFailOtherDynamicSuccess.good
M test/optimizations/autoLocalAccess/regularCommaDeclaration.good
M test/optimizations/autoLocalAccess/regularDeclaration.good
M test/optimizations/autoLocalAccess/regularDeclaration2D.good
M test/optimizations/autoLocalAccess/staticSuccessDynamicFail.good
M test/optimizations/autoLocalAccess/unalignedSameDist.good
M test/optimizations/autoLocalAccess/withInitializerCall.good
M test/optimizations/autoLocalAccess/zipper/allDynamicsFailStatic.good
M test/optimizations/autoLocalAccess/zipper/commaDecl.good
M test/optimizations/autoLocalAccess/zipper/copyInitDeclaration.good
M test/optimizations/autoLocalAccess/zipper/dotDomDeclaration.good
M test/optimizations/autoLocalAccess/zipper/dynamicCheckInGenericFunction.good
M test/optimizations/autoLocalAccess/zipper/dynamicChecks.good
M test/optimizations/autoLocalAccess/zipper/elemAsIndex.good
M test/optimizations/autoLocalAccess/zipper/functionArgs.good
M test/optimizations/autoLocalAccess/zipper/interveningForallOrOn.chpl
M test/optimizations/autoLocalAccess/zipper/interveningForallOrOn.good
M test/optimizations/autoLocalAccess/zipper/multipleAccessDynamic.good
M test/optimizations/autoLocalAccess/zipper/multipleAccessStatic.good
M test/optimizations/autoLocalAccess/zipper/nonDomainIter.good
M test/optimizations/autoLocalAccess/zipper/oneStaticFailOtherDynamicSuccess.good
M test/optimizations/autoLocalAccess/zipper/regularCommaDeclaration.good
M test/optimizations/autoLocalAccess/zipper/regularDeclaration.good
M test/optimizations/autoLocalAccess/zipper/regularDeclaration2D.good
M test/optimizations/autoLocalAccess/zipper/staticSuccessDynamicFail.good
M test/optimizations/autoLocalAccess/zipper/withInitializerCall.good
M util/chpl-completion.bash
Compare: https://github.com/chapel-lang/chapel/compare/475cd47c6bec...bd1c1a66e417