Enable SME2 Streaming SVE in ARM#9126
Open
stevesuzuki-arm wants to merge 3 commits intohalide:mainfrom
Open
Conversation
Added: - Target::SME2 definition - streaming_vector_bits in Target for SME2 - Auto-detect SME2 and streaming_vector_bits - sme_streaming() scheduling directive in Func and Pipeline - DeviceAPI::Host_SMEStreaming in IR "For" - LowerSMEStreamingTasks pass to extract streaming closure - Attribute in LoweredFunc for streaming closure - LLVM Function attribute to control streaming mode - NoInline to prevent streaming closure from inlined - "aarch64_pstate_sm_body" to emit smstart/smstop transition - Disable gather/scatter in SME streaming mode Tests: - Add correctness/sme_streaming - Run simd_op_check_sve2 in SME streaming mode - Add test to assert runtime streaming vscale
Contributor
Author
|
This PR is ready for review. I will touch on this in dev meeting if I have a chance. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Enable SME2 Streaming SVE in ARM
This PR adds initial ARM SME2 streaming-mode support to Halide,
which allows us to compute with longer vector length SVE on targets with SME2.
A new
sme_streaming(enable, var)scheduling directive provides the usersthe option to control which loop is computed in streaming-mode.
The change introduces a new
Target::SME2feature andTarget::streaming_vector_bits.natural_vector_size()now depends on whether in streaming-mode or not,because
streaming_vector_bitsmay not be the same asvector_bits.In Halide lowering, a new
LowerSMEStreamingTaskspass is added,which extracts the loop with streaming-mode as internal closure function
so that we can attach the LLVM function attributes to transit to/from streaming-mode.
aarch64_pstate_sm_bodyto emit smstart/smstop transitionNoInlineto prevent streaming closure from inlined to non-streaming functionIn CodeGen,
target_vscale()depends on whether streaming-mode or notand it varies even in a Module, although it is constant within Function boundary.
In streaming-mode, vector type code-gen and intrinsic selection are
performed based on
streaming_vector_bits(streaming vscale).In terms of coverage, it is almost the same as existing SVE2 code-gen
while SME2 specific instruction has not been enabled for now.
Additionally, the following changes are implemented:
SME2andstreaming_vector_bitson host CPUBreaking changes
halide_error_vscale_invalid()is changed. Will consider to have separate API if necessaryChecklist