[Codegen] Spill/Restore FP/BP under option #114791

mahesh-attarde · 2024-11-04T13:13:51Z

With Change #81048
There are more than 70+ Application Runtime fails for X86 Target.
Moving this change under option.

llvmbot · 2024-11-04T13:14:31Z

@llvm/pr-subscribers-backend-x86

Author: Mahesh-Attarde (mahesh-attarde)

Changes

With Change #81048
There are more than 70+ Application Runtime fails for X86 Target.
Moving this change under option.

Patch is 46.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114791.diff

13 Files Affected:

(modified) llvm/lib/CodeGen/PrologEpilogInserter.cpp (+7-1)
(modified) llvm/test/CodeGen/X86/apx/push2-pop2-vector-register.ll (+29-4)
(modified) llvm/test/CodeGen/X86/apx/push2-pop2.ll (+151-24)
(modified) llvm/test/CodeGen/X86/apx/pushp-popp.ll (+16-4)
(modified) llvm/test/CodeGen/X86/avx512-intel-ocl.ll (+191-8)
(modified) llvm/test/CodeGen/X86/clobber_base_ptr.ll (+1-1)
(modified) llvm/test/CodeGen/X86/clobber_frame_ptr.ll (+1-1)
(modified) llvm/test/CodeGen/X86/clobber_frame_ptr2.ll (+2-1)
(modified) llvm/test/CodeGen/X86/clobber_frame_ptr_x32.ll (+1-1)
(modified) llvm/test/CodeGen/X86/i386-baseptr.ll (+2-2)
(modified) llvm/test/CodeGen/X86/inline-asm-function-call-pic.ll (+28-2)
(modified) llvm/test/CodeGen/X86/x86-32-intrcc.ll (+1-3)
(modified) llvm/test/CodeGen/X86/x86-64-baseptr.ll (+182-12)

diff --git a/llvm/lib/CodeGen/PrologEpilogInserter.cpp b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
index ee03eaa8ae527c..da9385ce4c96d4 100644
--- a/llvm/lib/CodeGen/PrologEpilogInserter.cpp
+++ b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
@@ -77,6 +77,11 @@ using MBBVector = SmallVector<MachineBasicBlock *, 4>;
 STATISTIC(NumLeafFuncWithSpills, "Number of leaf functions with CSRs");
 STATISTIC(NumFuncSeen, "Number of functions seen in PEI");
 
+// Experimental Feature enables spilling and reload FP/BP
+static cl::opt<bool>
+    EnableSpillFPBP("enable-spill-fpbp",
+                    cl::desc("Spill clobbered fp register to stack."),
+                    cl::init(false), cl::Hidden);
 
 namespace {
 
@@ -231,7 +236,8 @@ bool PEI::runOnMachineFunction(MachineFunction &MF) {
   // Spill frame pointer and/or base pointer registers if they are clobbered.
   // It is placed before call frame instruction elimination so it will not mess
   // with stack arguments.
-  TFI->spillFPBP(MF);
+  if (EnableSpillFPBP)
+    TFI->spillFPBP(MF);
 
   // Calculate the MaxCallFrameSize value for the function's frame
   // information. Also eliminates call frame pseudo instructions.
diff --git a/llvm/test/CodeGen/X86/apx/push2-pop2-vector-register.ll b/llvm/test/CodeGen/X86/apx/push2-pop2-vector-register.ll
index f20c4c1ae27867..b4b205bd292276 100644
--- a/llvm/test/CodeGen/X86/apx/push2-pop2-vector-register.ll
+++ b/llvm/test/CodeGen/X86/apx/push2-pop2-vector-register.ll
@@ -2,6 +2,7 @@
 ; Check PUSH2/POP2 is not used for vector registers
 ; RUN: llc < %s -mtriple=x86_64-pc-windows-msvc -mattr=+push2pop2 | FileCheck %s --check-prefix=CHECK
 ; RUN: llc < %s -mtriple=x86_64-pc-windows-msvc -mattr=+push2pop2 -frame-pointer=all | FileCheck %s --check-prefix=FRAME
+; RUN: llc < %s -mtriple=x86_64-pc-windows-msvc -mattr=+push2pop2 -frame-pointer=all  --enable-spill-fpbp=true | FileCheck %s --check-prefix=FRAME-SPILL
 
 define void @widget(float %arg) nounwind {
 ; CHECK-LABEL: widget:
@@ -43,17 +44,41 @@ define void @widget(float %arg) nounwind {
 ; FRAME-NEXT:    xorl %r8d, %r8d
 ; FRAME-NEXT:    callq *%rsi
 ; FRAME-NEXT:    movss %xmm6, 0
-; FRAME-NEXT:    pushq %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm6 # 16-byte Reload
 ; FRAME-NEXT:    addq $48, %rsp
 ; FRAME-NEXT:    pop2 %r15, %rsi
 ; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: widget:
+; FRAME-SPILL:       # %bb.0: # %bb
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    push2 %rsi, %r15
+; FRAME-SPILL-NEXT:    subq $48, %rsp
+; FRAME-SPILL-NEXT:    leaq {{[0-9]+}}(%rsp), %rbp
+; FRAME-SPILL-NEXT:    movaps %xmm6, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; FRAME-SPILL-NEXT:    movaps %xmm0, %xmm6
+; FRAME-SPILL-NEXT:    xorl %esi, %esi
+; FRAME-SPILL-NEXT:    xorl %ecx, %ecx
+; FRAME-SPILL-NEXT:    callq *%rsi
+; FRAME-SPILL-NEXT:    xorl %ecx, %ecx
+; FRAME-SPILL-NEXT:    xorl %edx, %edx
+; FRAME-SPILL-NEXT:    xorl %r8d, %r8d
+; FRAME-SPILL-NEXT:    callq *%rsi
+; FRAME-SPILL-NEXT:    movss %xmm6, 0
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm6 # 16-byte Reload
+; FRAME-SPILL-NEXT:    addq $48, %rsp
+; FRAME-SPILL-NEXT:    pop2 %r15, %rsi
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    retq
 bb:
   %call = tail call float null(ptr null)
   %call1 = tail call i32 null(ptr null, i32 0, i32 0)
diff --git a/llvm/test/CodeGen/X86/apx/push2-pop2.ll b/llvm/test/CodeGen/X86/apx/push2-pop2.ll
index f5be484be2b1a6..d6bb1a24aa6b7b 100644
--- a/llvm/test/CodeGen/X86/apx/push2-pop2.ll
+++ b/llvm/test/CodeGen/X86/apx/push2-pop2.ll
@@ -2,6 +2,7 @@
 ; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+push2pop2 | FileCheck %s --check-prefix=CHECK
 ; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+push2pop2,+ppx | FileCheck %s --check-prefix=PPX
 ; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+push2pop2 -frame-pointer=all | FileCheck %s --check-prefix=FRAME
+; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+push2pop2 -frame-pointer=all --enable-spill-fpbp=true | FileCheck %s --check-prefix=FRAME-SPILL
 
 define void @csr1() nounwind {
 ; CHECK-LABEL: csr1:
@@ -24,14 +25,23 @@ define void @csr1() nounwind {
 ; FRAME:       # %bb.0: # %entry
 ; FRAME-NEXT:    pushq %rbp
 ; FRAME-NEXT:    movq %rsp, %rbp
-; FRAME-NEXT:    pushq %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: csr1:
+; FRAME-SPILL:       # %bb.0: # %entry
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    retq
 entry:
   tail call void asm sideeffect "", "~{rbp},~{dirflag},~{fpsr},~{flags}"()
   ret void
@@ -63,15 +73,26 @@ define void @csr2() nounwind {
 ; FRAME-NEXT:    pushq %rbp
 ; FRAME-NEXT:    movq %rsp, %rbp
 ; FRAME-NEXT:    pushq %r15
-; FRAME-NEXT:    pushq %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    popq %r15
 ; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: csr2:
+; FRAME-SPILL:       # %bb.0: # %entry
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    pushq %r15
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    popq %r15
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    retq
 entry:
   tail call void asm sideeffect "", "~{rbp},~{r15},~{dirflag},~{fpsr},~{flags}"()
   ret void
@@ -103,15 +124,26 @@ define void @csr3() nounwind {
 ; FRAME-NEXT:    pushq %rbp
 ; FRAME-NEXT:    movq %rsp, %rbp
 ; FRAME-NEXT:    push2 %r14, %r15
-; FRAME-NEXT:    pushq %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    pop2 %r15, %r14
 ; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: csr3:
+; FRAME-SPILL:       # %bb.0: # %entry
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    push2 %r14, %r15
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    pop2 %r15, %r14
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    retq
 entry:
   tail call void asm sideeffect "", "~{rbp},~{r15},~{r14},~{dirflag},~{fpsr},~{flags}"()
   ret void
@@ -148,16 +180,29 @@ define void @csr4() nounwind {
 ; FRAME-NEXT:    movq %rsp, %rbp
 ; FRAME-NEXT:    push2 %r14, %r15
 ; FRAME-NEXT:    pushq %r13
-; FRAME-NEXT:    pushq %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    popq %r13
 ; FRAME-NEXT:    pop2 %r15, %r14
 ; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: csr4:
+; FRAME-SPILL:       # %bb.0: # %entry
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    push2 %r14, %r15
+; FRAME-SPILL-NEXT:    pushq %r13
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    popq %r13
+; FRAME-SPILL-NEXT:    pop2 %r15, %r14
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    retq
 entry:
   tail call void asm sideeffect "", "~{rbp},~{r15},~{r14},~{r13},~{dirflag},~{fpsr},~{flags}"()
   ret void
@@ -194,16 +239,29 @@ define void @csr5() nounwind {
 ; FRAME-NEXT:    movq %rsp, %rbp
 ; FRAME-NEXT:    push2 %r14, %r15
 ; FRAME-NEXT:    push2 %r12, %r13
-; FRAME-NEXT:    pushq %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    pop2 %r13, %r12
 ; FRAME-NEXT:    pop2 %r15, %r14
 ; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: csr5:
+; FRAME-SPILL:       # %bb.0: # %entry
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    push2 %r14, %r15
+; FRAME-SPILL-NEXT:    push2 %r12, %r13
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    pop2 %r13, %r12
+; FRAME-SPILL-NEXT:    pop2 %r15, %r14
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    retq
 entry:
   tail call void asm sideeffect "", "~{rbp},~{r15},~{r14},~{r13},~{r12},~{dirflag},~{fpsr},~{flags}"()
   ret void
@@ -245,17 +303,32 @@ define void @csr6() nounwind {
 ; FRAME-NEXT:    push2 %r14, %r15
 ; FRAME-NEXT:    push2 %r12, %r13
 ; FRAME-NEXT:    pushq %rbx
-; FRAME-NEXT:    pushq %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    popq %rbx
 ; FRAME-NEXT:    pop2 %r13, %r12
 ; FRAME-NEXT:    pop2 %r15, %r14
 ; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: csr6:
+; FRAME-SPILL:       # %bb.0: # %entry
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    push2 %r14, %r15
+; FRAME-SPILL-NEXT:    push2 %r12, %r13
+; FRAME-SPILL-NEXT:    pushq %rbx
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    popq %rbx
+; FRAME-SPILL-NEXT:    pop2 %r13, %r12
+; FRAME-SPILL-NEXT:    pop2 %r15, %r14
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    retq
 entry:
   tail call void asm sideeffect "", "~{rbp},~{r15},~{r14},~{r13},~{r12},~{rbx},~{dirflag},~{fpsr},~{flags}"()
   ret void
@@ -421,6 +494,60 @@ define void @lea_in_epilog(i1 %arg, ptr %arg1, ptr %arg2, i64 %arg3, i64 %arg4,
 ; FRAME-NEXT:    movq $0, (%rax)
 ; FRAME-NEXT:  .LBB6_5: # %bb14
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: lea_in_epilog:
+; FRAME-SPILL:       # %bb.0: # %bb
+; FRAME-SPILL-NEXT:    testb $1, %dil
+; FRAME-SPILL-NEXT:    je .LBB6_5
+; FRAME-SPILL-NEXT:  # %bb.1: # %bb13
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    push2 %r14, %r15
+; FRAME-SPILL-NEXT:    push2 %r12, %r13
+; FRAME-SPILL-NEXT:    pushq %rbx
+; FRAME-SPILL-NEXT:    subq $24, %rsp
+; FRAME-SPILL-NEXT:    movq %rsi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
+; FRAME-SPILL-NEXT:    addq 16(%rbp), %r9
+; FRAME-SPILL-NEXT:    movq 48(%rbp), %rbx
+; FRAME-SPILL-NEXT:    addq %r9, %rbx
+; FRAME-SPILL-NEXT:    movq 40(%rbp), %r12
+; FRAME-SPILL-NEXT:    addq %r9, %r12
+; FRAME-SPILL-NEXT:    movq 32(%rbp), %r15
+; FRAME-SPILL-NEXT:    addq %r9, %r15
+; FRAME-SPILL-NEXT:    xorl %r13d, %r13d
+; FRAME-SPILL-NEXT:    xorl %r14d, %r14d
+; FRAME-SPILL-NEXT:    movl %edi, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; FRAME-SPILL-NEXT:    .p2align 4
+; FRAME-SPILL-NEXT:  .LBB6_2: # %bb15
+; FRAME-SPILL-NEXT:    # =>This Inner Loop Header: Depth=1
+; FRAME-SPILL-NEXT:    movq %r9, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
+; FRAME-SPILL-NEXT:    incq %r14
+; FRAME-SPILL-NEXT:    movl $432, %edx # imm = 0x1B0
+; FRAME-SPILL-NEXT:    xorl %edi, %edi
+; FRAME-SPILL-NEXT:    movq %r12, %rsi
+; FRAME-SPILL-NEXT:    callq memcpy@PLT
+; FRAME-SPILL-NEXT:    movl {{[-0-9]+}}(%r{{[sb]}}p), %edi # 4-byte Reload
+; FRAME-SPILL-NEXT:    movq {{[-0-9]+}}(%r{{[sb]}}p), %r9 # 8-byte Reload
+; FRAME-SPILL-NEXT:    movq 16(%rbp), %rax
+; FRAME-SPILL-NEXT:    addq %rax, %rbx
+; FRAME-SPILL-NEXT:    addq %rax, %r12
+; FRAME-SPILL-NEXT:    addq %rax, %r15
+; FRAME-SPILL-NEXT:    addq %rax, %r9
+; FRAME-SPILL-NEXT:    addq $8, %r13
+; FRAME-SPILL-NEXT:    testb $1, %dil
+; FRAME-SPILL-NEXT:    je .LBB6_2
+; FRAME-SPILL-NEXT:  # %bb.3: # %bb11
+; FRAME-SPILL-NEXT:    movq {{[-0-9]+}}(%r{{[sb]}}p), %rax # 8-byte Reload
+; FRAME-SPILL-NEXT:    leaq {{[0-9]+}}(%rsp), %rsp
+; FRAME-SPILL-NEXT:    popq %rbx
+; FRAME-SPILL-NEXT:    pop2 %r13, %r12
+; FRAME-SPILL-NEXT:    pop2 %r15, %r14
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    jne .LBB6_5
+; FRAME-SPILL-NEXT:  # %bb.4: # %bb12
+; FRAME-SPILL-NEXT:    movq $0, (%rax)
+; FRAME-SPILL-NEXT:  .LBB6_5: # %bb14
+; FRAME-SPILL-NEXT:    retq
 bb:
   br i1 %arg, label %bb13, label %bb14
 
diff --git a/llvm/test/CodeGen/X86/apx/pushp-popp.ll b/llvm/test/CodeGen/X86/apx/pushp-popp.ll
index 625e70b07198e8..4097c59c56437b 100644
--- a/llvm/test/CodeGen/X86/apx/pushp-popp.ll
+++ b/llvm/test/CodeGen/X86/apx/pushp-popp.ll
@@ -1,6 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ppx | FileCheck %s --check-prefix=CHECK
 ; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ppx -frame-pointer=all | FileCheck %s --check-prefix=FRAME
+; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ppx -frame-pointer=all --enable-spill-fpbp=true | FileCheck %s --check-prefix=FRAME-SPILL
 
 define void @csr2() nounwind {
 ; CHECK-LABEL: csr2:
@@ -18,15 +19,26 @@ define void @csr2() nounwind {
 ; FRAME-NEXT:    pushp %rbp
 ; FRAME-NEXT:    movq %rsp, %rbp
 ; FRAME-NEXT:    pushp %r15
-; FRAME-NEXT:    pushp %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popp %rbp
 ; FRAME-NEXT:    popp %r15
 ; FRAME-NEXT:    popp %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: csr2:
+; FRAME-SPILL:       # %bb.0: # %entry
+; FRAME-SPILL-NEXT:    pushp %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    pushp %r15
+; FRAME-SPILL-NEXT:    pushp %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popp %rbp
+; FRAME-SPILL-NEXT:    popp %r15
+; FRAME-SPILL-NEXT:    popp %rbp
+; FRAME-SPILL-NEXT:    retq
 entry:
   tail call void asm sideeffect "", "~{rbp},~{r15},~{dirflag},~{fpsr},~{flags}"()
   ret void
diff --git a/llvm/test/CodeGen/X86/avx512-intel-ocl.ll b/llvm/test/CodeGen/X86/avx512-intel-ocl.ll
index 6c68279b8d04ae..3873b920d4de69 100644
--- a/llvm/test/CodeGen/X86/avx512-intel-ocl.ll
+++ b/llvm/test/CodeGen/X86/avx512-intel-ocl.ll
@@ -6,7 +6,9 @@
 ; RUN: llc < %s -mtriple=x86_64-win32 -mcpu=knl | FileCheck %s -check-prefixes=WIN64,WIN64-KNL
 ; RUN: llc < %s -mtriple=x86_64-win32 -mcpu=skx | FileCheck %s -check-prefixes=WIN64,WIN64-SKX
 ; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl | FileCheck %s -check-prefixes=X64,X64-KNL
+; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl --enable-spill-fpbp=true | FileCheck %s -check-prefixes=X64-SPILL,X64-SPILL-KNL
 ; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=skx | FileCheck %s -check-prefixes=X64,X64-SKX
+; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=skx --enable-spill-fpbp=true | FileCheck %s -check-prefixes=X64-SPILL,X64-SPILL-SKX
 
 declare <16 x float> @func_float16_ptr(<16 x float>, ptr)
 declare <16 x float> @func_float16(<16 x float>, <16 x float>)
@@ -70,17 +72,35 @@ define <16 x float> @testf16_inp(<16 x float> %a, <16 x float> %b) nounwind {
 ; X64-NEXT:    subq $128, %rsp
 ; X64-NEXT:    vaddps %zmm1, %zmm0, %zmm0
 ; X64-NEXT:    movq %rsp, %rdi
-; X64-NEXT:    pushq %rbp
-; X64-NEXT:    pushq %rax
 ; X64-NEXT:    callq _func_float16_ptr
-; X64-NEXT:    addq $8, %rsp
-; X64-NEXT:    popq %rbp
 ; X64-NEXT:    vaddps (%rsp), %zmm0, %zmm0
 ; X64-NEXT:    leaq -16(%rbp), %rsp
 ; X64-NEXT:    popq %r12
 ; X64-NEXT:    popq %r13
 ; X64-NEXT:    popq %rbp
 ; X64-NEXT:    retq
+;
+; X64-SPILL-LABEL: testf16_inp:
+; X64-SPILL:       ## %bb.0:
+; X64-SPILL-NEXT:    pushq %rbp
+; X64-SPILL-NEXT:    movq %rsp, %rbp
+; X64-SPILL-NEXT:    pushq %r13
+; X64-SPILL-NEXT:    pushq %r12
+; X64-SPILL-NEXT:    andq $-64, %rsp
+; X64-SPILL-NEXT:    subq $128, %rsp
+; X64-SPILL-NEXT:    vaddps %zmm1, %zmm0, %zmm0
+; X64-SPILL-NEXT:    movq %rsp, %rdi
+; X64-SPILL-NEXT:    pushq %rbp
+; X64-SPILL-NEXT:    pushq %rax
+; X64-SPILL-NEXT:    callq _func_float16_ptr
+; X64-SPILL-NEXT:    addq $8, %rsp
+; X64-SPILL-NEXT:    popq %rbp
+; X64-SPILL-NEXT:    vaddps (%rsp), %zmm0, %zmm0
+; X64-SPILL-NEXT:    leaq -16(%rbp), %rsp
+; X64-SPILL-NEXT:    popq %r12
+; X64-SPILL-NEXT:    popq %r13
+; X64-SPILL-NEXT:    popq %rbp
+; X64-SPILL-NEXT:    retq
   %y = alloca <16 x float>, align 64
   %x = fadd <16 x float> %a, %b
   %1 = call intel_ocl_bicc <16 x float> @func_float16_ptr(<16 x float> %x, ptr %y)
@@ -154,11 +174,7 @@ define <16 x float> @testf16_regs(<16 x float> %a, <16 x float> %b) nounwind {
 ; X64-NEXT:    vmovaps %zmm1, %zmm16
 ; X64-NEXT:    vaddps %zmm1, %zmm0, %zmm0
 ; X64-NEXT:    movq %rsp, %rdi
-; X64-NEXT:    pushq %rbp
-; X64-NEXT:    pushq %rax
 ; X64-NEXT:    callq _func_float16_ptr
-; X64-NEXT:    addq $8, %rsp
-; X64-NEXT:    popq %rbp
 ; X64-NEXT:    vaddps %zmm16, %zmm0, %zmm0
 ; X64-NEXT:    vaddps (%rsp), %zmm0, %zmm0
 ; X64-NEXT:    leaq -16(%rbp), %rsp
@@ -166,6 +182,30 @@ define <16 x float> @testf16_regs(<16 x float> %a, <16 x float> %b) nounwind {
 ; X64-NEXT:    popq %r13
 ; X64-NEXT:    popq %rbp
 ; X64-NEXT:    retq
+;
+; X64-SPILL-LABEL: testf16_regs:
+; X64-SPILL:       ## %bb.0:
+; X64-SPILL-NEXT:    pushq %rbp
+; X64-SPILL-NEXT:    movq %rsp, %rbp
+; X64-SPILL-NEXT:    pushq %r13
+; X64-SPILL-NEXT:    pushq %r12
+; X64-SPILL-NEXT:    andq $-64, %rsp
+; X64-SPILL-NEXT:    subq $128, %rsp
+; X64-SPILL-NEXT:    vmovaps %zmm1, %zmm16
+; X64-SPILL-NEXT:    vaddps %zmm1, %zmm0, %zmm0
+; X64-SPILL-NEXT:    movq %rsp, %rdi
+; X64-SPILL-NEXT:    pushq %rbp
+; X64-SPILL-NEXT:    pushq %rax
+; X64-SPILL-NEXT:    callq _func_float16_ptr
+; X64-SPILL-NEXT:    addq $8, %rsp
+; X64-SPILL-NEXT:    popq %rbp
+; X64-SPILL-NEXT:    vaddps %zmm16, %zmm0, %zmm0
+; X64-SPILL-NEXT:    vaddps (%rsp), %zmm0, %zmm0
+; X64-SPILL-NEXT:    leaq -16(%rbp), %rsp
+; X64-SPILL-NEXT:    popq %r12
+; X64-SPILL-NEXT:    popq %r13
+; X64-SPILL-NEXT:    popq %rbp
+; X64-SPILL-NEXT:    retq
   %y = alloca <16 x float>, align 64
   %x = fadd <16 x float> %a, %b
   %1 = call intel_ocl_bicc <16 x float> @func_float16_ptr(<16 x float> %x, ptr %y)
@@ -348,6 +388,55 @@ define intel_ocl_bicc <16 x float> @test_prolog_epilog(<16 x float> %a, <16 x fl
 ; X64-KNL-NEXT:    popq %rsi
 ; X64-KNL-NEXT:    retq
 ;
+; X64-SPILL-KNL-LABEL: test_prolog_epilog:
+; X64-SPILL-KNL:       ## %bb.0:
+; X64-SPILL-KNL-NEXT:    pushq %rsi
+; X64-SPILL-KNL-NEXT:    subq $1072, %rsp ## imm = 0x430
+; X64-SPILL-KNL-NEXT:    kmovw %k7, {{[-0-9]+}}(%r{{[sb]}}p) ## 2-byte Spill
+; X64-SPILL-KNL-NEXT:    kmovw %k6, {{[-0-9]+}}(%r{{[sb]}}p) ## 2-byte Spill
+; X64-SPILL-KNL-NEXT:    kmovw %k5, {{[-0-9]+}}(%r{{[sb]}}p) ## 2-byte Spill
+; X64-SPILL-KNL-NEXT:    kmovw %k4, {{[-0-9]+}}(%r{{[sb]}}p) ## 2-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm31, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm30, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm29, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm28, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm27, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm26, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm25, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm24, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byt...
[truncated]

mahesh-attarde · 2024-11-04T13:15:49Z

@rnk @efriedma-quic Here is consideration by author on option, Can we choose this until we know how did it impact?
#81048 (comment)
@weiguozhi

rnk · 2024-11-04T17:24:05Z

llvm/lib/CodeGen/PrologEpilogInserter.cpp

+static cl::opt<bool>
+    EnableSpillFPBP("enable-spill-fpbp",
+                    cl::desc("Spill clobbered fp register to stack."),
+                    cl::init(false), cl::Hidden);


I'm in favor of adding a flag to facilitate debugging, but until we have a clear bug report with a clear understanding of why this is causing regressions, I'm not in favor of this default (false) which will regress all of the use cases that were fixed by this change. We need a clear rationale why the old behavior was better than the new behavior in order to change behavior. Flags, though, are reasonable if the new behavior is still unstable.

yes. I have worked out ir output out of llvm-reduce. I can share snippet or ir soon.

efriedma-quic · 2024-11-04T17:46:41Z

If there's an open bug report reporting some specific code pattern breaks, we can discuss on that bug report whether it makes sense to temporarily revert the change.

It doesn't make sense to permanently disable the code. Which is essentially what you're proposing, because you haven't given enough information to fix the issue.

mahesh-attarde · 2024-11-04T19:41:17Z

If there's an open bug report reporting some specific code pattern breaks, we can discuss on that bug report whether it makes sense to temporarily revert the change.

It doesn't make sense to permanently disable the code. Which is essentially what you're proposing, because you haven't given enough information to fix the issue.

Short story on Bug, For code
$rsp = 0x7fffc27fadc0 and vmovdqa64 %zmm19, 0x90(%rsp) results in SEGFAULT(#GP). I have number of cases with same pattern. Same Example expectation with vmovdqa64 %zmm19, 0x80(%rsp)
I will provide more details soon

rnk · 2024-11-04T19:57:40Z

Stack misalignment suggests there could be a bug in how we compute stack alignment here:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86FrameLowering.cpp#L4240

If stack accesses still use RSP, that suggests that RBP (FP) is in use, but not RBX (BP). So, a possible reproducer would be a big inline asm blob that clobbers RBP and accesses a highly aligned vector alloca, like a __m512* variable.

rnk · 2024-11-04T20:07:52Z

It seems like there is a general problem with inline asm blobs accessing stack variables, see this example: https://godbolt.org/z/s16MxPME3

#include <immintrin.h>
int f() {
    __m512i aligned{};
    asm volatile ("vmovdqa64 %%zmm0, %0" : "=m"(aligned) : "m"(aligned) : "rbp");
    return aligned[0];
}
---->
f():
        push    rbp
        mov     rbp, rsp
        and     rsp, -64
        sub     rsp, 128
....
        push    rbp
        push    rax
        vmovdqa64       zmmword ptr [rsp], zmm0
        add     rsp, 8
        pop     rbp
        mov     rax, qword ptr [rsp]
        mov     rsp, rbp
        pop     rbp
        ret

[rsp] here looks like it will be misaligned (16 bytes instead of 64), and it will overwrite RBP because we haven't adjusted the memory operands to account for the disturbance to RSP.

That seems like a pretty big design flaw that could break more code than it fixes.

efriedma-quic · 2024-11-04T20:23:45Z

That seems bad enough that it might be worth temporarily disabling the code.

weiguozhi · 2024-11-04T20:57:14Z

Suppose we haven't protected rbp around the inline asm, then following code should be generated

#include <immintrin.h>
int f() {
    __m512i aligned{};
    asm volatile ("vmovdqa64 %%zmm0, %0" : "=m"(aligned) : "m"(aligned) : "rbp");
    return aligned[0];
}
---->
f():
        push    rbp
        mov     rbp, rsp
        and     rsp, -64
        sub     rsp, 128
....
        vmovdqa64       zmmword ptr [rsp], zmm0          // clobber rbp
        mov     rax, qword ptr [rsp]
        mov     rsp, rbp                                 // store a garbage value to rsp
        pop     rbp
        ret

Because rbp is clobbered by the inline asm, at the end of the function, we still can't restore rsp correctly.

rnk · 2024-11-04T21:15:08Z

I agree, it's also a major flaw that LLVM doesn't diagnose or do anything about RBP conflicts. :)

FWIW, GCC just gives up. You can see in the godbolt example they say "bp cannot be used in 'asm' here". Maybe a reasonable compromise is that we emit a backend error for inline asm that clobbers BP/FP, but spill FP/BP around calls that use it.

Because of GCC's behavior, there is probably less real world code out there clobbering RBP, but there's probably a lot of generated vector kernels in inline asm that try to rely on the compiler to align the stack and do the fiddly bits of procedure linkage.

mahesh-attarde · 2024-11-05T06:05:38Z

It seems like there is a general problem with inline asm blobs accessing stack variables, see this example: https://godbolt.org/z/s16MxPME3
#include <immintrin.h>
int f() {
    __m512i aligned{};
    asm volatile ("vmovdqa64 %%zmm0, %0" : "=m"(aligned) : "m"(aligned) : "rbp");
    return aligned[0];
}
---->
f():
        push    rbp
        mov     rbp, rsp
        and     rsp, -64
        sub     rsp, 128
....
        push    rbp
        push    rax
        vmovdqa64       zmmword ptr [rsp], zmm0
        add     rsp, 8
        pop     rbp
        mov     rax, qword ptr [rsp]
        mov     rsp, rbp
        pop     rbp
        ret
[rsp] here looks like it will be misaligned (16 bytes instead of 64), and it will overwrite RBP because we haven't adjusted the memory operands to account for the disturbance to RSP.

That seems like a pretty big design flaw that could break more code than it fixes.

@rnk @weiguozhi My use case has similar pattern. One point to note here is, in my case i don't have inline asm's either.
Since we agree upon problem, can we disable with PR or revert change now?

mahesh-attarde · 2024-11-05T06:21:36Z

I opened new issue #114941

weiguozhi · 2024-11-05T16:42:40Z

@mahesh-attarde the problem is even without the patch #81048, https://godbolt.org/z/s16MxPME3 still generates wrong code. Could you paste your code here?

@rnk compile time error sounds reasonable for these difficult cases.

mahesh-attarde · 2024-11-06T07:36:15Z

@mahesh-attarde the problem is even without the patch #81048, https://godbolt.org/z/s16MxPME3 still generates wrong code. Could you paste your code here?

@rnk compile time error sounds reasonable for these difficult cases.

Here is simplified version with godbolt llc.
https://godbolt.org/z/TPMr7jcxn

bar:                                    # @bar
        push    rbp
        mov     rbp, rsp
        and     rsp, -64
        sub     rsp, 192
        movaps  xmmword ptr [rsp + 160], xmm3   # 16-byte Spill
        movaps  xmmword ptr [rsp + 144], xmm2   # 16-byte Spill
        movaps  xmmword ptr [rsp + 128], xmm1   # 16-byte Spill
>>> Problem
        movaps  xmmword ptr [rsp + 112], xmm0   # 16-byte Spill
<<<<
        movaps  xmm0, xmmword ptr [rsp]
...

For Aligned Mov Memory operand has to be 16/32/64 byte align.

mahesh-attarde · 2024-11-06T14:36:57Z

if you are ok, can we merge this PR or revert original ?

weiguozhi · 2024-11-06T16:53:48Z

Isn't [rsp + 112] 16 bytes aligned?

Spill/Restore FP/BP under option

50144c4

llvmbot added the backend:X86 label Nov 4, 2024

mahesh-attarde changed the title ~~Spill/Restore FP/BP under option~~ [Codegen] Spill/Restore FP/BP under option Nov 4, 2024

rnk reviewed Nov 4, 2024

View reviewed changes

efriedma-quic requested review from aengelke, weiguozhi and efriedma-quic November 4, 2024 20:20

mahesh-attarde mentioned this pull request Nov 5, 2024

[X86] stack realignment is incorrect when Spill/restore BP is used #114941

Open

Merge branch 'main' into fix_rsp_crash

63139a2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Codegen] Spill/Restore FP/BP under option #114791

[Codegen] Spill/Restore FP/BP under option #114791

mahesh-attarde commented Nov 4, 2024

llvmbot commented Nov 4, 2024

mahesh-attarde commented Nov 4, 2024 •

edited

Loading

rnk Nov 4, 2024

mahesh-attarde Nov 4, 2024

efriedma-quic commented Nov 4, 2024

mahesh-attarde commented Nov 4, 2024 •

edited

Loading

rnk commented Nov 4, 2024

rnk commented Nov 4, 2024

efriedma-quic commented Nov 4, 2024

weiguozhi commented Nov 4, 2024

rnk commented Nov 4, 2024

mahesh-attarde commented Nov 5, 2024

mahesh-attarde commented Nov 5, 2024 •

edited

Loading

weiguozhi commented Nov 5, 2024

mahesh-attarde commented Nov 6, 2024

mahesh-attarde commented Nov 6, 2024

weiguozhi commented Nov 6, 2024

[Codegen] Spill/Restore FP/BP under option #114791

Are you sure you want to change the base?

[Codegen] Spill/Restore FP/BP under option #114791

Conversation

mahesh-attarde commented Nov 4, 2024

llvmbot commented Nov 4, 2024

mahesh-attarde commented Nov 4, 2024 • edited Loading

rnk Nov 4, 2024

Choose a reason for hiding this comment

mahesh-attarde Nov 4, 2024

Choose a reason for hiding this comment

efriedma-quic commented Nov 4, 2024

mahesh-attarde commented Nov 4, 2024 • edited Loading

rnk commented Nov 4, 2024

rnk commented Nov 4, 2024

efriedma-quic commented Nov 4, 2024

weiguozhi commented Nov 4, 2024

rnk commented Nov 4, 2024

mahesh-attarde commented Nov 5, 2024

mahesh-attarde commented Nov 5, 2024 • edited Loading

weiguozhi commented Nov 5, 2024

mahesh-attarde commented Nov 6, 2024

mahesh-attarde commented Nov 6, 2024

weiguozhi commented Nov 6, 2024

mahesh-attarde commented Nov 4, 2024 •

edited

Loading

mahesh-attarde commented Nov 4, 2024 •

edited

Loading

mahesh-attarde commented Nov 5, 2024 •

edited

Loading