Branch: refs/heads/master
Revision: 3f5a5fc
Author: mppf
Log Message:
Merge pull request #16152 from mppf/vec-fixes
Improve LLVM vectorization support
Resolves #11636.
This PR takes several steps to improve LLVM vectorization hinting
support:
- for LLVM vectorization hinting, use
llvm.loop.parallel_accesses
and
llvm.access.group
instead of the deprecated
llvm.mem.parallel_loop_access
. - for RV vectorization hint, use a separate idea of vectorization
hazards (separate from hazards for the LLVM hint) - use a new strategy instead of
markVectorizationHazards
to check for
patterns that the LLVM loop vectorizer does not support. Previously
this was trying to guess which patterns would end up with problematic
allocas for loop-local variables. Now it marks only variables that are
non-stack or are declared outside of any order-indpendent loop with
llvm.access.group; in this way if the loop-local stack memory
variables are converted to registers as expected, the loop will be
parallel as far as LLVM is concerned. - change the vectorization hinting strategy for iterators based on
problems I observed once vectorization was enabled with vectorization
being applied to cases where it should not. Yielding loops in
follower/standalone iterators are no longer marked order independent.
Instead, these are expected to be marked by the user or an earlier
part of compilation. For now there is a pragma to do that. In the
future we expect that it will be up to users to indicate this - this
comment discusses possible syntax
approaches.
Also, the compiler knows how to infer order independence in the very
common case offor x in something() do yield x
. Since it might cause
performance surprises if an iterator is not marked - serial,
standalone, and follower iterators in the standard/internal/package
modules must either have inferred order independence or use a pragma
to indicate if the yielding for loop is order independent or not.
Reviewed by @e-kayrakli with input from @slnguyen - thanks!
- [x] full local --verify testing
- [x] full local --llvm testing
- [x] full local --llvm --fast testing
Future work:
- remove
CHPL_PRAGMA_IVDEP
entirely and--force-vectorization
hinting for the C backend.
Modified Files:
A test/llvm/parallel_loop_access/different_numbers.compopts
A test/llvm/parallel_loop_access/generation_inside_loop.compopts
A test/llvm/parallel_loop_access/no_parallel_loop_accesses.chpl
A test/llvm/parallel_loop_access/no_parallel_loop_accesses.compopts
A test/llvm/parallel_loop_access/no_parallel_loop_accesses.good
A test/llvm/parallel_loop_access/parallel_loop_accesses1.chpl
A test/llvm/parallel_loop_access/parallel_loop_accesses1.compopts
A test/llvm/parallel_loop_access/parallel_loop_accesses1.good
A test/llvm/parallel_loop_access/parallel_loop_accesses2.chpl
A test/llvm/parallel_loop_access/parallel_loop_accesses2.compopts
A test/llvm/parallel_loop_access/parallel_loop_accesses2.good
A test/llvm/parallel_loop_access/parallel_loop_accesses3.chpl
A test/llvm/parallel_loop_access/parallel_loop_accesses3.compopts
A test/llvm/parallel_loop_access/parallel_loop_accesses3.good
A test/llvm/parallel_loop_access/parallel_loop_accesses3.noexec
A test/llvm/parallel_loop_access/simple_forall.compopts
A test/llvm/parallel_loop_access/simple_loop.compopts
A test/llvm/parallel_loop_access/zippered_forall.compopts
A test/llvm/vectorization/parallel_loop_accesses/PREDIFF
A test/llvm/vectorization/parallel_loop_accesses/SKIPIF
A test/performance/vectorization/vectorizeOnly/iterator-loop-vectorization.chpl
A test/performance/vectorization/vectorizeOnly/iterator-loop-vectorization.compopts
A test/performance/vectorization/vectorizeOnly/iterator-loop-vectorization.good
A test/performance/vectorization/vectorizeOnly/iterator-loop-vectorization.prediff
A test/performance/vectorization/vectorizeYieldingLoopsPragma/vectorZipForLoop.bad
A test/performance/vectorization/vectorizeYieldingLoopsPragma/vectorZipForLoop.future
R test/llvm/parallel_loop_access/COMPOPTS
M compiler/AST/LoopExpr.cpp
M compiler/AST/LoopStmt.cpp
M compiler/AST/foralls.cpp
M compiler/AST/type.cpp
M compiler/codegen/CForLoop.cpp
M compiler/codegen/DoWhileStmt.cpp
M compiler/codegen/LoopStmt.cpp
M compiler/codegen/WhileDoStmt.cpp
M compiler/codegen/expr.cpp
M compiler/codegen/symbol.cpp
M compiler/include/LoopStmt.h
M compiler/include/codegen.h
M compiler/include/flags_list.h
M compiler/include/genret.h
M compiler/include/type.h
M compiler/resolution/lowerIterators.cpp
M compiler/resolution/resolveFunction.cpp
M modules/dists/BlockCycDist.chpl
M modules/dists/BlockDist.chpl
M modules/dists/CyclicDist.chpl
M modules/dists/DimensionalDist2D.chpl
M modules/dists/HashedDist.chpl
M modules/dists/PrivateDist.chpl
M modules/dists/SparseBlockDist.chpl
M modules/dists/StencilDist.chpl
M modules/dists/dims/BlockCycDim.chpl
M modules/dists/dims/BlockDim.chpl
M modules/dists/dims/ReplicatedDim.chpl
M modules/internal/ArrayViewRankChange.chpl
M modules/internal/ArrayViewReindex.chpl
M modules/internal/ArrayViewSlice.chpl
M modules/internal/Bytes.chpl
M modules/internal/BytesStringCommon.chpl
M modules/internal/ChapelArray.chpl
M modules/internal/ChapelError.chpl
M modules/internal/ChapelHashtable.chpl
M modules/internal/ChapelIteratorSupport.chpl
M modules/internal/ChapelLocale.chpl
M modules/internal/ChapelRange.chpl
M modules/internal/ChapelTuple.chpl
M modules/internal/DefaultAssociative.chpl
M modules/internal/DefaultRectangular.chpl
M modules/internal/DefaultSparse.chpl
M modules/internal/String.chpl
M modules/layouts/LayoutCS.chpl
M modules/packages/DistributedBag.chpl
M modules/packages/DistributedDeque.chpl
M modules/packages/EpochManager.chpl
M modules/packages/FunctionalOperations.chpl
M modules/packages/LockFreeQueue.chpl
M modules/packages/LockFreeStack.chpl
M modules/packages/OrderedSet/Treap.chpl
M modules/packages/RangeChunk.chpl
M modules/packages/RecordParser.chpl
M modules/packages/Sort.chpl
M modules/packages/VisualDebug.chpl
M modules/standard/FileSystem.chpl
M modules/standard/Heap.chpl
M modules/standard/IO.chpl
M modules/standard/LinkedLists.chpl
M modules/standard/List.chpl
M modules/standard/Map.chpl
M modules/standard/Random.chpl
M modules/standard/Regexp.chpl
M modules/standard/Set.chpl
M modules/standard/Types.chpl
M test/llvm/parallel_loop_access/different_numbers.chpl
M test/llvm/parallel_loop_access/generation_inside_loop.chpl
M test/llvm/parallel_loop_access/simple_forall.chpl
M test/llvm/parallel_loop_access/simple_loop.chpl
M test/llvm/parallel_loop_access/zippered_forall.chpl
M test/performance/vectorization/hintTests/vec-hint-ok-harder-zips.good
M test/performance/vectorization/hintTests/vec-hint-ok-harder.good
M test/performance/vectorization/hintTests/vec-hint-ok.good
M test/performance/vectorization/hintTests/vec-no-hint.chpl
M test/performance/vectorization/hintTests/vec-no-hint.good
M test/performance/vectorization/vectorPragmas/basicIters.compgood
M test/performance/vectorization/vectorPragmas/cForLoopInParIter.compgood
M test/performance/vectorization/vectorPragmas/doWhileInParIter.chpl
M test/performance/vectorization/vectorPragmas/doWhileInParIter.compgood
M test/performance/vectorization/vectorPragmas/forallInStandalone.compgood
M test/performance/vectorization/vectorPragmas/loopWithoutYield.compgood
M test/performance/vectorization/vectorPragmas/loopsInForallNoVector.compgood
M test/performance/vectorization/vectorPragmas/nestedLoopsInFollower.compgood
M test/performance/vectorization/vectorPragmas/nonInlinableFollower.compgood
M test/performance/vectorization/vectorPragmas/whileDoInParIter.chpl
M test/performance/vectorization/vectorPragmas/whileDoInParIter.compgood
M test/performance/vectorization/vectorPragmas/zipIterInFollower.chpl
M test/performance/vectorization/vectorPragmas/zipIterInFollower.compgood
M test/performance/vectorization/vectorizeOnly/vectorizeOnlyEmitsVectorPragma.good
M test/performance/vectorization/vectorizeYieldingLoopsPragma/vectorDoWhileLoop.good
M test/performance/vectorization/vectorizeYieldingLoopsPragma/vectorForLoop.good
M test/performance/vectorization/vectorizeYieldingLoopsPragma/vectorWhileLoop.good
M test/performance/vectorization/vectorizeYieldingLoopsPragma/vectorZipForLoop.good
Compare: https://github.com/chapel-lang/chapel/compare/5532ddf30ba7...3f5a5fc1586d