-
Notifications
You must be signed in to change notification settings - Fork 15.9k
[AArch64][SME2] Allow lowering to whilelo.x2 in non-streaming mode #178399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64][SME2] Allow lowering to whilelo.x2 in non-streaming mode #178399
Conversation
Since llvm#145322 relaxed the SME predicate for the multi-register while instructions, these instructions are allowed in non-streaming mode when SME2 is available. This patch removes the isStreaming() restriction from both performActiveLaneMaskCombine & ReplaceGetActiveLaneMaskResults, allowing the whilelo.x2 intrinsic to be used if SVE or streaming SVE is available.
|
@llvm/pr-subscribers-backend-aarch64 Author: Kerry McLaughlin (kmclaughlin-arm) ChangesSince #145322 relaxed the SME predicate for the multi-register while This patch removes the isStreaming() restriction from both Full diff: https://github.com/llvm/llvm-project/pull/178399.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 6c0544005e1dd..3b53b91d9f798 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1546,8 +1546,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::GET_ACTIVE_LANE_MASK, VT, Legal);
}
- if (Subtarget->hasSVE2p1() ||
- (Subtarget->hasSME2() && Subtarget->isStreaming()))
+ if (Subtarget->isSVEorStreamingSVEAvailable() &&
+ (Subtarget->hasSVE2p1() || Subtarget->hasSME2()))
setOperationAction(ISD::GET_ACTIVE_LANE_MASK, MVT::nxv32i1, Custom);
for (auto VT : {MVT::v16i8, MVT::v8i8, MVT::v4i16, MVT::v2i32})
@@ -19384,7 +19384,8 @@ performActiveLaneMaskCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
return While;
if (!N->getValueType(0).isScalableVector() ||
- (!ST->hasSVE2p1() && !(ST->hasSME2() && ST->isStreaming())))
+ !ST->isSVEorStreamingSVEAvailable() ||
+ !(ST->hasSVE2p1() || ST->hasSME2()))
return SDValue();
// Count the number of users which are extract_vectors.
@@ -29251,8 +29252,8 @@ void AArch64TargetLowering::ReplaceExtractSubVectorResults(
void AArch64TargetLowering::ReplaceGetActiveLaneMaskResults(
SDNode *N, SmallVectorImpl<SDValue> &Results, SelectionDAG &DAG) const {
- assert((Subtarget->hasSVE2p1() ||
- (Subtarget->hasSME2() && Subtarget->isStreaming())) &&
+ assert((Subtarget->isSVEorStreamingSVEAvailable() &&
+ (Subtarget->hasSVE2p1() || Subtarget->hasSME2())) &&
"Custom lower of get.active.lane.mask missing required feature.");
assert(N->getValueType(0) == MVT::nxv32i1 &&
diff --git a/llvm/test/CodeGen/AArch64/get-active-lane-mask-extract.ll b/llvm/test/CodeGen/AArch64/get-active-lane-mask-extract.ll
index aa0b934151fef..ce3452d6e21ee 100644
--- a/llvm/test/CodeGen/AArch64/get-active-lane-mask-extract.ll
+++ b/llvm/test/CodeGen/AArch64/get-active-lane-mask-extract.ll
@@ -1,7 +1,7 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
; RUN: llc -enable-subreg-liveness -mattr=+sve < %s | FileCheck %s -check-prefix CHECK-SVE
-; RUN: llc -enable-subreg-liveness -mattr=+sve2p1 < %s | FileCheck %s -check-prefix CHECK-SVE2p1-SME2 -check-prefix CHECK-SVE2p1
-; RUN: llc -enable-subreg-liveness -mattr=+sve -mattr=+sme2 -force-streaming < %s | FileCheck %s -check-prefix CHECK-SVE2p1-SME2 -check-prefix CHECK-SME2
+; RUN: llc -enable-subreg-liveness -mattr=+sve2p1 < %s | FileCheck %s -check-prefix CHECK-SVE2p1-SME2
+; RUN: llc -enable-subreg-liveness -mattr=+sve -mattr=+sme2 < %s | FileCheck %s -check-prefix CHECK-SVE2p1-SME2
target triple = "aarch64-linux"
; Test combining of getActiveLaneMask with a pair of extract_vector operations.
@@ -183,26 +183,15 @@ define void @test_fixed_extract(i64 %i, i64 %n) #0 {
; CHECK-SVE-NEXT: ext z1.b, z1.b, z1.b, #8
; CHECK-SVE-NEXT: b use
;
-; CHECK-SVE2p1-LABEL: test_fixed_extract:
-; CHECK-SVE2p1: // %bb.0:
-; CHECK-SVE2p1-NEXT: whilelo p0.s, x0, x1
-; CHECK-SVE2p1-NEXT: cset w8, mi
-; CHECK-SVE2p1-NEXT: mov z1.s, p0/z, #1 // =0x1
-; CHECK-SVE2p1-NEXT: fmov s0, w8
-; CHECK-SVE2p1-NEXT: mov v0.s[1], v1.s[1]
-; CHECK-SVE2p1-NEXT: ext z1.b, z1.b, z1.b, #8
-; CHECK-SVE2p1-NEXT: b use
-;
-; CHECK-SME2-LABEL: test_fixed_extract:
-; CHECK-SME2: // %bb.0:
-; CHECK-SME2-NEXT: whilelo p0.s, x0, x1
-; CHECK-SME2-NEXT: cset w8, mi
-; CHECK-SME2-NEXT: mov z1.s, p0/z, #1 // =0x1
-; CHECK-SME2-NEXT: fmov s2, w8
-; CHECK-SME2-NEXT: mov z0.s, z1.s[1]
-; CHECK-SME2-NEXT: ext z1.b, z1.b, z1.b, #8
-; CHECK-SME2-NEXT: zip1 z0.s, z2.s, z0.s
-; CHECK-SME2-NEXT: b use
+; CHECK-SVE2p1-SME2-LABEL: test_fixed_extract:
+; CHECK-SVE2p1-SME2: // %bb.0:
+; CHECK-SVE2p1-SME2-NEXT: whilelo p0.s, x0, x1
+; CHECK-SVE2p1-SME2-NEXT: cset w8, mi
+; CHECK-SVE2p1-SME2-NEXT: mov z1.s, p0/z, #1 // =0x1
+; CHECK-SVE2p1-SME2-NEXT: fmov s0, w8
+; CHECK-SVE2p1-SME2-NEXT: mov v0.s[1], v1.s[1]
+; CHECK-SVE2p1-SME2-NEXT: ext z1.b, z1.b, z1.b, #8
+; CHECK-SVE2p1-SME2-NEXT: b use
%r = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 %i, i64 %n)
%v0 = call <2 x i1> @llvm.vector.extract.v2i1.nxv4i1.i64(<vscale x 4 x i1> %r, i64 0)
%v1 = call <2 x i1> @llvm.vector.extract.v2i1.nxv4i1.i64(<vscale x 4 x i1> %r, i64 2)
|
CarolineConcatto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Kerry,
I think we should still keep the test for sme2 with streaming mode, but the rest looks ok. You 've changed all the places the used to require sve2p1 and sme.
| if (Subtarget->hasSVE2p1() || | ||
| (Subtarget->hasSME2() && Subtarget->isStreaming())) | ||
| if (Subtarget->isSVEorStreamingSVEAvailable() && | ||
| (Subtarget->hasSVE2p1() || Subtarget->hasSME2())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as HasSVE2p1_or_SME2, but we cannot use it here, we can only copy and paste what is in there.
CarolineConcatto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Kerry,
LGTM!
|
/cherry-pick 162267e |
|
Failed to cherry-pick: 162267e https://github.com/llvm/llvm-project/actions/runs/21481535853 Please manually backport the fix and push it to your github fork. Once this is done, please create a pull request |
…eaming mode (llvm#178399) Backport: llvm@162267e
|
Backport request: #178672 |
Since #145322 relaxed the SME predicate for the multi-register while
instructions, these instructions are allowed in non-streaming mode
when SME2 is available.
This patch removes the isStreaming() restriction from both
performActiveLaneMaskCombine & ReplaceGetActiveLaneMaskResults,
allowing the whilelo.x2 intrinsic to be used if SVE or streaming
SVE is available.