Skip to content

Conversation

@kmclaughlin-arm
Copy link
Contributor

Since #145322 relaxed the SME predicate for the multi-register while
instructions, these instructions are allowed in non-streaming mode
when SME2 is available.

This patch removes the isStreaming() restriction from both
performActiveLaneMaskCombine & ReplaceGetActiveLaneMaskResults,
allowing the whilelo.x2 intrinsic to be used if SVE or streaming
SVE is available.

Since llvm#145322 relaxed the SME predicate for the multi-register while
instructions, these instructions are allowed in non-streaming mode
when SME2 is available.

This patch removes the isStreaming() restriction from both
performActiveLaneMaskCombine & ReplaceGetActiveLaneMaskResults,
allowing the whilelo.x2 intrinsic to be used if SVE or streaming
SVE is available.
@llvmbot
Copy link
Member

llvmbot commented Jan 28, 2026

@llvm/pr-subscribers-backend-aarch64

Author: Kerry McLaughlin (kmclaughlin-arm)

Changes

Since #145322 relaxed the SME predicate for the multi-register while
instructions, these instructions are allowed in non-streaming mode
when SME2 is available.

This patch removes the isStreaming() restriction from both
performActiveLaneMaskCombine & ReplaceGetActiveLaneMaskResults,
allowing the whilelo.x2 intrinsic to be used if SVE or streaming
SVE is available.


Full diff: https://github.com/llvm/llvm-project/pull/178399.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+6-5)
  • (modified) llvm/test/CodeGen/AArch64/get-active-lane-mask-extract.ll (+11-22)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 6c0544005e1dd..3b53b91d9f798 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1546,8 +1546,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
       setOperationAction(ISD::GET_ACTIVE_LANE_MASK, VT, Legal);
     }
 
-    if (Subtarget->hasSVE2p1() ||
-        (Subtarget->hasSME2() && Subtarget->isStreaming()))
+    if (Subtarget->isSVEorStreamingSVEAvailable() &&
+        (Subtarget->hasSVE2p1() || Subtarget->hasSME2()))
       setOperationAction(ISD::GET_ACTIVE_LANE_MASK, MVT::nxv32i1, Custom);
 
     for (auto VT : {MVT::v16i8, MVT::v8i8, MVT::v4i16, MVT::v2i32})
@@ -19384,7 +19384,8 @@ performActiveLaneMaskCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
     return While;
 
   if (!N->getValueType(0).isScalableVector() ||
-      (!ST->hasSVE2p1() && !(ST->hasSME2() && ST->isStreaming())))
+      !ST->isSVEorStreamingSVEAvailable() ||
+      !(ST->hasSVE2p1() || ST->hasSME2()))
     return SDValue();
 
   // Count the number of users which are extract_vectors.
@@ -29251,8 +29252,8 @@ void AArch64TargetLowering::ReplaceExtractSubVectorResults(
 
 void AArch64TargetLowering::ReplaceGetActiveLaneMaskResults(
     SDNode *N, SmallVectorImpl<SDValue> &Results, SelectionDAG &DAG) const {
-  assert((Subtarget->hasSVE2p1() ||
-          (Subtarget->hasSME2() && Subtarget->isStreaming())) &&
+  assert((Subtarget->isSVEorStreamingSVEAvailable() &&
+          (Subtarget->hasSVE2p1() || Subtarget->hasSME2())) &&
          "Custom lower of get.active.lane.mask missing required feature.");
 
   assert(N->getValueType(0) == MVT::nxv32i1 &&
diff --git a/llvm/test/CodeGen/AArch64/get-active-lane-mask-extract.ll b/llvm/test/CodeGen/AArch64/get-active-lane-mask-extract.ll
index aa0b934151fef..ce3452d6e21ee 100644
--- a/llvm/test/CodeGen/AArch64/get-active-lane-mask-extract.ll
+++ b/llvm/test/CodeGen/AArch64/get-active-lane-mask-extract.ll
@@ -1,7 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
 ; RUN: llc -enable-subreg-liveness -mattr=+sve    < %s | FileCheck %s -check-prefix CHECK-SVE
-; RUN: llc -enable-subreg-liveness -mattr=+sve2p1 < %s | FileCheck %s -check-prefix CHECK-SVE2p1-SME2 -check-prefix CHECK-SVE2p1
-; RUN: llc -enable-subreg-liveness -mattr=+sve -mattr=+sme2 -force-streaming < %s | FileCheck %s -check-prefix CHECK-SVE2p1-SME2 -check-prefix CHECK-SME2
+; RUN: llc -enable-subreg-liveness -mattr=+sve2p1 < %s | FileCheck %s -check-prefix CHECK-SVE2p1-SME2
+; RUN: llc -enable-subreg-liveness -mattr=+sve -mattr=+sme2 < %s | FileCheck %s -check-prefix CHECK-SVE2p1-SME2
 target triple = "aarch64-linux"
 
 ; Test combining of getActiveLaneMask with a pair of extract_vector operations.
@@ -183,26 +183,15 @@ define void @test_fixed_extract(i64 %i, i64 %n) #0 {
 ; CHECK-SVE-NEXT:    ext z1.b, z1.b, z1.b, #8
 ; CHECK-SVE-NEXT:    b use
 ;
-; CHECK-SVE2p1-LABEL: test_fixed_extract:
-; CHECK-SVE2p1:       // %bb.0:
-; CHECK-SVE2p1-NEXT:    whilelo p0.s, x0, x1
-; CHECK-SVE2p1-NEXT:    cset w8, mi
-; CHECK-SVE2p1-NEXT:    mov z1.s, p0/z, #1 // =0x1
-; CHECK-SVE2p1-NEXT:    fmov s0, w8
-; CHECK-SVE2p1-NEXT:    mov v0.s[1], v1.s[1]
-; CHECK-SVE2p1-NEXT:    ext z1.b, z1.b, z1.b, #8
-; CHECK-SVE2p1-NEXT:    b use
-;
-; CHECK-SME2-LABEL: test_fixed_extract:
-; CHECK-SME2:       // %bb.0:
-; CHECK-SME2-NEXT:    whilelo p0.s, x0, x1
-; CHECK-SME2-NEXT:    cset w8, mi
-; CHECK-SME2-NEXT:    mov z1.s, p0/z, #1 // =0x1
-; CHECK-SME2-NEXT:    fmov s2, w8
-; CHECK-SME2-NEXT:    mov z0.s, z1.s[1]
-; CHECK-SME2-NEXT:    ext z1.b, z1.b, z1.b, #8
-; CHECK-SME2-NEXT:    zip1 z0.s, z2.s, z0.s
-; CHECK-SME2-NEXT:    b use
+; CHECK-SVE2p1-SME2-LABEL: test_fixed_extract:
+; CHECK-SVE2p1-SME2:       // %bb.0:
+; CHECK-SVE2p1-SME2-NEXT:    whilelo p0.s, x0, x1
+; CHECK-SVE2p1-SME2-NEXT:    cset w8, mi
+; CHECK-SVE2p1-SME2-NEXT:    mov z1.s, p0/z, #1 // =0x1
+; CHECK-SVE2p1-SME2-NEXT:    fmov s0, w8
+; CHECK-SVE2p1-SME2-NEXT:    mov v0.s[1], v1.s[1]
+; CHECK-SVE2p1-SME2-NEXT:    ext z1.b, z1.b, z1.b, #8
+; CHECK-SVE2p1-SME2-NEXT:    b use
     %r = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 %i, i64 %n)
     %v0 = call <2 x i1> @llvm.vector.extract.v2i1.nxv4i1.i64(<vscale x 4 x i1> %r, i64 0)
     %v1 = call <2 x i1> @llvm.vector.extract.v2i1.nxv4i1.i64(<vscale x 4 x i1> %r, i64 2)

Copy link
Contributor

@CarolineConcatto CarolineConcatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Kerry,
I think we should still keep the test for sme2 with streaming mode, but the rest looks ok. You 've changed all the places the used to require sve2p1 and sme.

if (Subtarget->hasSVE2p1() ||
(Subtarget->hasSME2() && Subtarget->isStreaming()))
if (Subtarget->isSVEorStreamingSVEAvailable() &&
(Subtarget->hasSVE2p1() || Subtarget->hasSME2()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as HasSVE2p1_or_SME2, but we cannot use it here, we can only copy and paste what is in there.

Copy link
Contributor

@CarolineConcatto CarolineConcatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Kerry,
LGTM!

@kmclaughlin-arm kmclaughlin-arm merged commit 162267e into llvm:main Jan 29, 2026
10 checks passed
@kmclaughlin-arm kmclaughlin-arm added this to the LLVM 22.x Release milestone Jan 29, 2026
@github-project-automation github-project-automation bot moved this to Needs Triage in LLVM Release Status Jan 29, 2026
@github-project-automation github-project-automation bot moved this from Needs Triage to Done in LLVM Release Status Jan 29, 2026
@kmclaughlin-arm
Copy link
Contributor Author

/cherry-pick 162267e

@llvmbot
Copy link
Member

llvmbot commented Jan 29, 2026

Failed to cherry-pick: 162267e

https://github.com/llvm/llvm-project/actions/runs/21481535853

Please manually backport the fix and push it to your github fork. Once this is done, please create a pull request

kmclaughlin-arm added a commit to kmclaughlin-arm/llvm-project that referenced this pull request Jan 29, 2026
@kmclaughlin-arm
Copy link
Contributor Author

Backport request: #178672

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

3 participants