Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Codegen] Spill/Restore FP/BP under option #114791

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mahesh-attarde
Copy link
Contributor

With Change #81048
There are more than 70+ Application Runtime fails for X86 Target.
Moving this change under option.

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 4, 2024

@llvm/pr-subscribers-backend-x86

Author: Mahesh-Attarde (mahesh-attarde)

Changes

With Change #81048
There are more than 70+ Application Runtime fails for X86 Target.
Moving this change under option.


Patch is 46.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114791.diff

13 Files Affected:

  • (modified) llvm/lib/CodeGen/PrologEpilogInserter.cpp (+7-1)
  • (modified) llvm/test/CodeGen/X86/apx/push2-pop2-vector-register.ll (+29-4)
  • (modified) llvm/test/CodeGen/X86/apx/push2-pop2.ll (+151-24)
  • (modified) llvm/test/CodeGen/X86/apx/pushp-popp.ll (+16-4)
  • (modified) llvm/test/CodeGen/X86/avx512-intel-ocl.ll (+191-8)
  • (modified) llvm/test/CodeGen/X86/clobber_base_ptr.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/clobber_frame_ptr.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/clobber_frame_ptr2.ll (+2-1)
  • (modified) llvm/test/CodeGen/X86/clobber_frame_ptr_x32.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/i386-baseptr.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/inline-asm-function-call-pic.ll (+28-2)
  • (modified) llvm/test/CodeGen/X86/x86-32-intrcc.ll (+1-3)
  • (modified) llvm/test/CodeGen/X86/x86-64-baseptr.ll (+182-12)
diff --git a/llvm/lib/CodeGen/PrologEpilogInserter.cpp b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
index ee03eaa8ae527c..da9385ce4c96d4 100644
--- a/llvm/lib/CodeGen/PrologEpilogInserter.cpp
+++ b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
@@ -77,6 +77,11 @@ using MBBVector = SmallVector<MachineBasicBlock *, 4>;
 STATISTIC(NumLeafFuncWithSpills, "Number of leaf functions with CSRs");
 STATISTIC(NumFuncSeen, "Number of functions seen in PEI");
 
+// Experimental Feature enables spilling and reload FP/BP
+static cl::opt<bool>
+    EnableSpillFPBP("enable-spill-fpbp",
+                    cl::desc("Spill clobbered fp register to stack."),
+                    cl::init(false), cl::Hidden);
 
 namespace {
 
@@ -231,7 +236,8 @@ bool PEI::runOnMachineFunction(MachineFunction &MF) {
   // Spill frame pointer and/or base pointer registers if they are clobbered.
   // It is placed before call frame instruction elimination so it will not mess
   // with stack arguments.
-  TFI->spillFPBP(MF);
+  if (EnableSpillFPBP)
+    TFI->spillFPBP(MF);
 
   // Calculate the MaxCallFrameSize value for the function's frame
   // information. Also eliminates call frame pseudo instructions.
diff --git a/llvm/test/CodeGen/X86/apx/push2-pop2-vector-register.ll b/llvm/test/CodeGen/X86/apx/push2-pop2-vector-register.ll
index f20c4c1ae27867..b4b205bd292276 100644
--- a/llvm/test/CodeGen/X86/apx/push2-pop2-vector-register.ll
+++ b/llvm/test/CodeGen/X86/apx/push2-pop2-vector-register.ll
@@ -2,6 +2,7 @@
 ; Check PUSH2/POP2 is not used for vector registers
 ; RUN: llc < %s -mtriple=x86_64-pc-windows-msvc -mattr=+push2pop2 | FileCheck %s --check-prefix=CHECK
 ; RUN: llc < %s -mtriple=x86_64-pc-windows-msvc -mattr=+push2pop2 -frame-pointer=all | FileCheck %s --check-prefix=FRAME
+; RUN: llc < %s -mtriple=x86_64-pc-windows-msvc -mattr=+push2pop2 -frame-pointer=all  --enable-spill-fpbp=true | FileCheck %s --check-prefix=FRAME-SPILL
 
 define void @widget(float %arg) nounwind {
 ; CHECK-LABEL: widget:
@@ -43,17 +44,41 @@ define void @widget(float %arg) nounwind {
 ; FRAME-NEXT:    xorl %r8d, %r8d
 ; FRAME-NEXT:    callq *%rsi
 ; FRAME-NEXT:    movss %xmm6, 0
-; FRAME-NEXT:    pushq %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm6 # 16-byte Reload
 ; FRAME-NEXT:    addq $48, %rsp
 ; FRAME-NEXT:    pop2 %r15, %rsi
 ; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: widget:
+; FRAME-SPILL:       # %bb.0: # %bb
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    push2 %rsi, %r15
+; FRAME-SPILL-NEXT:    subq $48, %rsp
+; FRAME-SPILL-NEXT:    leaq {{[0-9]+}}(%rsp), %rbp
+; FRAME-SPILL-NEXT:    movaps %xmm6, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; FRAME-SPILL-NEXT:    movaps %xmm0, %xmm6
+; FRAME-SPILL-NEXT:    xorl %esi, %esi
+; FRAME-SPILL-NEXT:    xorl %ecx, %ecx
+; FRAME-SPILL-NEXT:    callq *%rsi
+; FRAME-SPILL-NEXT:    xorl %ecx, %ecx
+; FRAME-SPILL-NEXT:    xorl %edx, %edx
+; FRAME-SPILL-NEXT:    xorl %r8d, %r8d
+; FRAME-SPILL-NEXT:    callq *%rsi
+; FRAME-SPILL-NEXT:    movss %xmm6, 0
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm6 # 16-byte Reload
+; FRAME-SPILL-NEXT:    addq $48, %rsp
+; FRAME-SPILL-NEXT:    pop2 %r15, %rsi
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    retq
 bb:
   %call = tail call float null(ptr null)
   %call1 = tail call i32 null(ptr null, i32 0, i32 0)
diff --git a/llvm/test/CodeGen/X86/apx/push2-pop2.ll b/llvm/test/CodeGen/X86/apx/push2-pop2.ll
index f5be484be2b1a6..d6bb1a24aa6b7b 100644
--- a/llvm/test/CodeGen/X86/apx/push2-pop2.ll
+++ b/llvm/test/CodeGen/X86/apx/push2-pop2.ll
@@ -2,6 +2,7 @@
 ; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+push2pop2 | FileCheck %s --check-prefix=CHECK
 ; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+push2pop2,+ppx | FileCheck %s --check-prefix=PPX
 ; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+push2pop2 -frame-pointer=all | FileCheck %s --check-prefix=FRAME
+; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+push2pop2 -frame-pointer=all --enable-spill-fpbp=true | FileCheck %s --check-prefix=FRAME-SPILL
 
 define void @csr1() nounwind {
 ; CHECK-LABEL: csr1:
@@ -24,14 +25,23 @@ define void @csr1() nounwind {
 ; FRAME:       # %bb.0: # %entry
 ; FRAME-NEXT:    pushq %rbp
 ; FRAME-NEXT:    movq %rsp, %rbp
-; FRAME-NEXT:    pushq %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: csr1:
+; FRAME-SPILL:       # %bb.0: # %entry
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    retq
 entry:
   tail call void asm sideeffect "", "~{rbp},~{dirflag},~{fpsr},~{flags}"()
   ret void
@@ -63,15 +73,26 @@ define void @csr2() nounwind {
 ; FRAME-NEXT:    pushq %rbp
 ; FRAME-NEXT:    movq %rsp, %rbp
 ; FRAME-NEXT:    pushq %r15
-; FRAME-NEXT:    pushq %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    popq %r15
 ; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: csr2:
+; FRAME-SPILL:       # %bb.0: # %entry
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    pushq %r15
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    popq %r15
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    retq
 entry:
   tail call void asm sideeffect "", "~{rbp},~{r15},~{dirflag},~{fpsr},~{flags}"()
   ret void
@@ -103,15 +124,26 @@ define void @csr3() nounwind {
 ; FRAME-NEXT:    pushq %rbp
 ; FRAME-NEXT:    movq %rsp, %rbp
 ; FRAME-NEXT:    push2 %r14, %r15
-; FRAME-NEXT:    pushq %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    pop2 %r15, %r14
 ; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: csr3:
+; FRAME-SPILL:       # %bb.0: # %entry
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    push2 %r14, %r15
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    pop2 %r15, %r14
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    retq
 entry:
   tail call void asm sideeffect "", "~{rbp},~{r15},~{r14},~{dirflag},~{fpsr},~{flags}"()
   ret void
@@ -148,16 +180,29 @@ define void @csr4() nounwind {
 ; FRAME-NEXT:    movq %rsp, %rbp
 ; FRAME-NEXT:    push2 %r14, %r15
 ; FRAME-NEXT:    pushq %r13
-; FRAME-NEXT:    pushq %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    popq %r13
 ; FRAME-NEXT:    pop2 %r15, %r14
 ; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: csr4:
+; FRAME-SPILL:       # %bb.0: # %entry
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    push2 %r14, %r15
+; FRAME-SPILL-NEXT:    pushq %r13
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    popq %r13
+; FRAME-SPILL-NEXT:    pop2 %r15, %r14
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    retq
 entry:
   tail call void asm sideeffect "", "~{rbp},~{r15},~{r14},~{r13},~{dirflag},~{fpsr},~{flags}"()
   ret void
@@ -194,16 +239,29 @@ define void @csr5() nounwind {
 ; FRAME-NEXT:    movq %rsp, %rbp
 ; FRAME-NEXT:    push2 %r14, %r15
 ; FRAME-NEXT:    push2 %r12, %r13
-; FRAME-NEXT:    pushq %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    pop2 %r13, %r12
 ; FRAME-NEXT:    pop2 %r15, %r14
 ; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: csr5:
+; FRAME-SPILL:       # %bb.0: # %entry
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    push2 %r14, %r15
+; FRAME-SPILL-NEXT:    push2 %r12, %r13
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    pop2 %r13, %r12
+; FRAME-SPILL-NEXT:    pop2 %r15, %r14
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    retq
 entry:
   tail call void asm sideeffect "", "~{rbp},~{r15},~{r14},~{r13},~{r12},~{dirflag},~{fpsr},~{flags}"()
   ret void
@@ -245,17 +303,32 @@ define void @csr6() nounwind {
 ; FRAME-NEXT:    push2 %r14, %r15
 ; FRAME-NEXT:    push2 %r12, %r13
 ; FRAME-NEXT:    pushq %rbx
-; FRAME-NEXT:    pushq %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    popq %rbx
 ; FRAME-NEXT:    pop2 %r13, %r12
 ; FRAME-NEXT:    pop2 %r15, %r14
 ; FRAME-NEXT:    popq %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: csr6:
+; FRAME-SPILL:       # %bb.0: # %entry
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    push2 %r14, %r15
+; FRAME-SPILL-NEXT:    push2 %r12, %r13
+; FRAME-SPILL-NEXT:    pushq %rbx
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    popq %rbx
+; FRAME-SPILL-NEXT:    pop2 %r13, %r12
+; FRAME-SPILL-NEXT:    pop2 %r15, %r14
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    retq
 entry:
   tail call void asm sideeffect "", "~{rbp},~{r15},~{r14},~{r13},~{r12},~{rbx},~{dirflag},~{fpsr},~{flags}"()
   ret void
@@ -421,6 +494,60 @@ define void @lea_in_epilog(i1 %arg, ptr %arg1, ptr %arg2, i64 %arg3, i64 %arg4,
 ; FRAME-NEXT:    movq $0, (%rax)
 ; FRAME-NEXT:  .LBB6_5: # %bb14
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: lea_in_epilog:
+; FRAME-SPILL:       # %bb.0: # %bb
+; FRAME-SPILL-NEXT:    testb $1, %dil
+; FRAME-SPILL-NEXT:    je .LBB6_5
+; FRAME-SPILL-NEXT:  # %bb.1: # %bb13
+; FRAME-SPILL-NEXT:    pushq %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    push2 %r14, %r15
+; FRAME-SPILL-NEXT:    push2 %r12, %r13
+; FRAME-SPILL-NEXT:    pushq %rbx
+; FRAME-SPILL-NEXT:    subq $24, %rsp
+; FRAME-SPILL-NEXT:    movq %rsi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
+; FRAME-SPILL-NEXT:    addq 16(%rbp), %r9
+; FRAME-SPILL-NEXT:    movq 48(%rbp), %rbx
+; FRAME-SPILL-NEXT:    addq %r9, %rbx
+; FRAME-SPILL-NEXT:    movq 40(%rbp), %r12
+; FRAME-SPILL-NEXT:    addq %r9, %r12
+; FRAME-SPILL-NEXT:    movq 32(%rbp), %r15
+; FRAME-SPILL-NEXT:    addq %r9, %r15
+; FRAME-SPILL-NEXT:    xorl %r13d, %r13d
+; FRAME-SPILL-NEXT:    xorl %r14d, %r14d
+; FRAME-SPILL-NEXT:    movl %edi, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; FRAME-SPILL-NEXT:    .p2align 4
+; FRAME-SPILL-NEXT:  .LBB6_2: # %bb15
+; FRAME-SPILL-NEXT:    # =>This Inner Loop Header: Depth=1
+; FRAME-SPILL-NEXT:    movq %r9, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
+; FRAME-SPILL-NEXT:    incq %r14
+; FRAME-SPILL-NEXT:    movl $432, %edx # imm = 0x1B0
+; FRAME-SPILL-NEXT:    xorl %edi, %edi
+; FRAME-SPILL-NEXT:    movq %r12, %rsi
+; FRAME-SPILL-NEXT:    callq memcpy@PLT
+; FRAME-SPILL-NEXT:    movl {{[-0-9]+}}(%r{{[sb]}}p), %edi # 4-byte Reload
+; FRAME-SPILL-NEXT:    movq {{[-0-9]+}}(%r{{[sb]}}p), %r9 # 8-byte Reload
+; FRAME-SPILL-NEXT:    movq 16(%rbp), %rax
+; FRAME-SPILL-NEXT:    addq %rax, %rbx
+; FRAME-SPILL-NEXT:    addq %rax, %r12
+; FRAME-SPILL-NEXT:    addq %rax, %r15
+; FRAME-SPILL-NEXT:    addq %rax, %r9
+; FRAME-SPILL-NEXT:    addq $8, %r13
+; FRAME-SPILL-NEXT:    testb $1, %dil
+; FRAME-SPILL-NEXT:    je .LBB6_2
+; FRAME-SPILL-NEXT:  # %bb.3: # %bb11
+; FRAME-SPILL-NEXT:    movq {{[-0-9]+}}(%r{{[sb]}}p), %rax # 8-byte Reload
+; FRAME-SPILL-NEXT:    leaq {{[0-9]+}}(%rsp), %rsp
+; FRAME-SPILL-NEXT:    popq %rbx
+; FRAME-SPILL-NEXT:    pop2 %r13, %r12
+; FRAME-SPILL-NEXT:    pop2 %r15, %r14
+; FRAME-SPILL-NEXT:    popq %rbp
+; FRAME-SPILL-NEXT:    jne .LBB6_5
+; FRAME-SPILL-NEXT:  # %bb.4: # %bb12
+; FRAME-SPILL-NEXT:    movq $0, (%rax)
+; FRAME-SPILL-NEXT:  .LBB6_5: # %bb14
+; FRAME-SPILL-NEXT:    retq
 bb:
   br i1 %arg, label %bb13, label %bb14
 
diff --git a/llvm/test/CodeGen/X86/apx/pushp-popp.ll b/llvm/test/CodeGen/X86/apx/pushp-popp.ll
index 625e70b07198e8..4097c59c56437b 100644
--- a/llvm/test/CodeGen/X86/apx/pushp-popp.ll
+++ b/llvm/test/CodeGen/X86/apx/pushp-popp.ll
@@ -1,6 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ppx | FileCheck %s --check-prefix=CHECK
 ; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ppx -frame-pointer=all | FileCheck %s --check-prefix=FRAME
+; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ppx -frame-pointer=all --enable-spill-fpbp=true | FileCheck %s --check-prefix=FRAME-SPILL
 
 define void @csr2() nounwind {
 ; CHECK-LABEL: csr2:
@@ -18,15 +19,26 @@ define void @csr2() nounwind {
 ; FRAME-NEXT:    pushp %rbp
 ; FRAME-NEXT:    movq %rsp, %rbp
 ; FRAME-NEXT:    pushp %r15
-; FRAME-NEXT:    pushp %rbp
-; FRAME-NEXT:    pushq %rax
 ; FRAME-NEXT:    #APP
 ; FRAME-NEXT:    #NO_APP
-; FRAME-NEXT:    popq %rax
-; FRAME-NEXT:    popp %rbp
 ; FRAME-NEXT:    popp %r15
 ; FRAME-NEXT:    popp %rbp
 ; FRAME-NEXT:    retq
+;
+; FRAME-SPILL-LABEL: csr2:
+; FRAME-SPILL:       # %bb.0: # %entry
+; FRAME-SPILL-NEXT:    pushp %rbp
+; FRAME-SPILL-NEXT:    movq %rsp, %rbp
+; FRAME-SPILL-NEXT:    pushp %r15
+; FRAME-SPILL-NEXT:    pushp %rbp
+; FRAME-SPILL-NEXT:    pushq %rax
+; FRAME-SPILL-NEXT:    #APP
+; FRAME-SPILL-NEXT:    #NO_APP
+; FRAME-SPILL-NEXT:    popq %rax
+; FRAME-SPILL-NEXT:    popp %rbp
+; FRAME-SPILL-NEXT:    popp %r15
+; FRAME-SPILL-NEXT:    popp %rbp
+; FRAME-SPILL-NEXT:    retq
 entry:
   tail call void asm sideeffect "", "~{rbp},~{r15},~{dirflag},~{fpsr},~{flags}"()
   ret void
diff --git a/llvm/test/CodeGen/X86/avx512-intel-ocl.ll b/llvm/test/CodeGen/X86/avx512-intel-ocl.ll
index 6c68279b8d04ae..3873b920d4de69 100644
--- a/llvm/test/CodeGen/X86/avx512-intel-ocl.ll
+++ b/llvm/test/CodeGen/X86/avx512-intel-ocl.ll
@@ -6,7 +6,9 @@
 ; RUN: llc < %s -mtriple=x86_64-win32 -mcpu=knl | FileCheck %s -check-prefixes=WIN64,WIN64-KNL
 ; RUN: llc < %s -mtriple=x86_64-win32 -mcpu=skx | FileCheck %s -check-prefixes=WIN64,WIN64-SKX
 ; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl | FileCheck %s -check-prefixes=X64,X64-KNL
+; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl --enable-spill-fpbp=true | FileCheck %s -check-prefixes=X64-SPILL,X64-SPILL-KNL
 ; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=skx | FileCheck %s -check-prefixes=X64,X64-SKX
+; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=skx --enable-spill-fpbp=true | FileCheck %s -check-prefixes=X64-SPILL,X64-SPILL-SKX
 
 declare <16 x float> @func_float16_ptr(<16 x float>, ptr)
 declare <16 x float> @func_float16(<16 x float>, <16 x float>)
@@ -70,17 +72,35 @@ define <16 x float> @testf16_inp(<16 x float> %a, <16 x float> %b) nounwind {
 ; X64-NEXT:    subq $128, %rsp
 ; X64-NEXT:    vaddps %zmm1, %zmm0, %zmm0
 ; X64-NEXT:    movq %rsp, %rdi
-; X64-NEXT:    pushq %rbp
-; X64-NEXT:    pushq %rax
 ; X64-NEXT:    callq _func_float16_ptr
-; X64-NEXT:    addq $8, %rsp
-; X64-NEXT:    popq %rbp
 ; X64-NEXT:    vaddps (%rsp), %zmm0, %zmm0
 ; X64-NEXT:    leaq -16(%rbp), %rsp
 ; X64-NEXT:    popq %r12
 ; X64-NEXT:    popq %r13
 ; X64-NEXT:    popq %rbp
 ; X64-NEXT:    retq
+;
+; X64-SPILL-LABEL: testf16_inp:
+; X64-SPILL:       ## %bb.0:
+; X64-SPILL-NEXT:    pushq %rbp
+; X64-SPILL-NEXT:    movq %rsp, %rbp
+; X64-SPILL-NEXT:    pushq %r13
+; X64-SPILL-NEXT:    pushq %r12
+; X64-SPILL-NEXT:    andq $-64, %rsp
+; X64-SPILL-NEXT:    subq $128, %rsp
+; X64-SPILL-NEXT:    vaddps %zmm1, %zmm0, %zmm0
+; X64-SPILL-NEXT:    movq %rsp, %rdi
+; X64-SPILL-NEXT:    pushq %rbp
+; X64-SPILL-NEXT:    pushq %rax
+; X64-SPILL-NEXT:    callq _func_float16_ptr
+; X64-SPILL-NEXT:    addq $8, %rsp
+; X64-SPILL-NEXT:    popq %rbp
+; X64-SPILL-NEXT:    vaddps (%rsp), %zmm0, %zmm0
+; X64-SPILL-NEXT:    leaq -16(%rbp), %rsp
+; X64-SPILL-NEXT:    popq %r12
+; X64-SPILL-NEXT:    popq %r13
+; X64-SPILL-NEXT:    popq %rbp
+; X64-SPILL-NEXT:    retq
   %y = alloca <16 x float>, align 64
   %x = fadd <16 x float> %a, %b
   %1 = call intel_ocl_bicc <16 x float> @func_float16_ptr(<16 x float> %x, ptr %y)
@@ -154,11 +174,7 @@ define <16 x float> @testf16_regs(<16 x float> %a, <16 x float> %b) nounwind {
 ; X64-NEXT:    vmovaps %zmm1, %zmm16
 ; X64-NEXT:    vaddps %zmm1, %zmm0, %zmm0
 ; X64-NEXT:    movq %rsp, %rdi
-; X64-NEXT:    pushq %rbp
-; X64-NEXT:    pushq %rax
 ; X64-NEXT:    callq _func_float16_ptr
-; X64-NEXT:    addq $8, %rsp
-; X64-NEXT:    popq %rbp
 ; X64-NEXT:    vaddps %zmm16, %zmm0, %zmm0
 ; X64-NEXT:    vaddps (%rsp), %zmm0, %zmm0
 ; X64-NEXT:    leaq -16(%rbp), %rsp
@@ -166,6 +182,30 @@ define <16 x float> @testf16_regs(<16 x float> %a, <16 x float> %b) nounwind {
 ; X64-NEXT:    popq %r13
 ; X64-NEXT:    popq %rbp
 ; X64-NEXT:    retq
+;
+; X64-SPILL-LABEL: testf16_regs:
+; X64-SPILL:       ## %bb.0:
+; X64-SPILL-NEXT:    pushq %rbp
+; X64-SPILL-NEXT:    movq %rsp, %rbp
+; X64-SPILL-NEXT:    pushq %r13
+; X64-SPILL-NEXT:    pushq %r12
+; X64-SPILL-NEXT:    andq $-64, %rsp
+; X64-SPILL-NEXT:    subq $128, %rsp
+; X64-SPILL-NEXT:    vmovaps %zmm1, %zmm16
+; X64-SPILL-NEXT:    vaddps %zmm1, %zmm0, %zmm0
+; X64-SPILL-NEXT:    movq %rsp, %rdi
+; X64-SPILL-NEXT:    pushq %rbp
+; X64-SPILL-NEXT:    pushq %rax
+; X64-SPILL-NEXT:    callq _func_float16_ptr
+; X64-SPILL-NEXT:    addq $8, %rsp
+; X64-SPILL-NEXT:    popq %rbp
+; X64-SPILL-NEXT:    vaddps %zmm16, %zmm0, %zmm0
+; X64-SPILL-NEXT:    vaddps (%rsp), %zmm0, %zmm0
+; X64-SPILL-NEXT:    leaq -16(%rbp), %rsp
+; X64-SPILL-NEXT:    popq %r12
+; X64-SPILL-NEXT:    popq %r13
+; X64-SPILL-NEXT:    popq %rbp
+; X64-SPILL-NEXT:    retq
   %y = alloca <16 x float>, align 64
   %x = fadd <16 x float> %a, %b
   %1 = call intel_ocl_bicc <16 x float> @func_float16_ptr(<16 x float> %x, ptr %y)
@@ -348,6 +388,55 @@ define intel_ocl_bicc <16 x float> @test_prolog_epilog(<16 x float> %a, <16 x fl
 ; X64-KNL-NEXT:    popq %rsi
 ; X64-KNL-NEXT:    retq
 ;
+; X64-SPILL-KNL-LABEL: test_prolog_epilog:
+; X64-SPILL-KNL:       ## %bb.0:
+; X64-SPILL-KNL-NEXT:    pushq %rsi
+; X64-SPILL-KNL-NEXT:    subq $1072, %rsp ## imm = 0x430
+; X64-SPILL-KNL-NEXT:    kmovw %k7, {{[-0-9]+}}(%r{{[sb]}}p) ## 2-byte Spill
+; X64-SPILL-KNL-NEXT:    kmovw %k6, {{[-0-9]+}}(%r{{[sb]}}p) ## 2-byte Spill
+; X64-SPILL-KNL-NEXT:    kmovw %k5, {{[-0-9]+}}(%r{{[sb]}}p) ## 2-byte Spill
+; X64-SPILL-KNL-NEXT:    kmovw %k4, {{[-0-9]+}}(%r{{[sb]}}p) ## 2-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm31, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm30, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm29, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm28, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm27, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm26, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm25, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byte Spill
+; X64-SPILL-KNL-NEXT:    vmovups %zmm24, {{[-0-9]+}}(%r{{[sb]}}p) ## 64-byt...
[truncated]

@mahesh-attarde
Copy link
Contributor Author

mahesh-attarde commented Nov 4, 2024

@rnk @efriedma-quic Here is consideration by author on option, Can we choose this until we know how did it impact?
#81048 (comment)
@weiguozhi

@mahesh-attarde mahesh-attarde changed the title Spill/Restore FP/BP under option [Codegen] Spill/Restore FP/BP under option Nov 4, 2024
static cl::opt<bool>
EnableSpillFPBP("enable-spill-fpbp",
cl::desc("Spill clobbered fp register to stack."),
cl::init(false), cl::Hidden);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of adding a flag to facilitate debugging, but until we have a clear bug report with a clear understanding of why this is causing regressions, I'm not in favor of this default (false) which will regress all of the use cases that were fixed by this change. We need a clear rationale why the old behavior was better than the new behavior in order to change behavior. Flags, though, are reasonable if the new behavior is still unstable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I have worked out ir output out of llvm-reduce. I can share snippet or ir soon.

@efriedma-quic
Copy link
Collaborator

If there's an open bug report reporting some specific code pattern breaks, we can discuss on that bug report whether it makes sense to temporarily revert the change.

It doesn't make sense to permanently disable the code. Which is essentially what you're proposing, because you haven't given enough information to fix the issue.

@mahesh-attarde
Copy link
Contributor Author

mahesh-attarde commented Nov 4, 2024

If there's an open bug report reporting some specific code pattern breaks, we can discuss on that bug report whether it makes sense to temporarily revert the change.

It doesn't make sense to permanently disable the code. Which is essentially what you're proposing, because you haven't given enough information to fix the issue.

Short story on Bug, For code
$rsp = 0x7fffc27fadc0 and vmovdqa64 %zmm19, 0x90(%rsp) results in SEGFAULT(#GP). I have number of cases with same pattern. Same Example expectation with vmovdqa64 %zmm19, 0x80(%rsp)
I will provide more details soon

@rnk
Copy link
Collaborator

rnk commented Nov 4, 2024

Stack misalignment suggests there could be a bug in how we compute stack alignment here:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86FrameLowering.cpp#L4240

If stack accesses still use RSP, that suggests that RBP (FP) is in use, but not RBX (BP). So, a possible reproducer would be a big inline asm blob that clobbers RBP and accesses a highly aligned vector alloca, like a __m512* variable.

@rnk
Copy link
Collaborator

rnk commented Nov 4, 2024

It seems like there is a general problem with inline asm blobs accessing stack variables, see this example: https://godbolt.org/z/s16MxPME3

#include <immintrin.h>
int f() {
    __m512i aligned{};
    asm volatile ("vmovdqa64 %%zmm0, %0" : "=m"(aligned) : "m"(aligned) : "rbp");
    return aligned[0];
}
---->
f():
        push    rbp
        mov     rbp, rsp
        and     rsp, -64
        sub     rsp, 128
....
        push    rbp
        push    rax
        vmovdqa64       zmmword ptr [rsp], zmm0
        add     rsp, 8
        pop     rbp
        mov     rax, qword ptr [rsp]
        mov     rsp, rbp
        pop     rbp
        ret

[rsp] here looks like it will be misaligned (16 bytes instead of 64), and it will overwrite RBP because we haven't adjusted the memory operands to account for the disturbance to RSP.

That seems like a pretty big design flaw that could break more code than it fixes.

@efriedma-quic
Copy link
Collaborator

That seems bad enough that it might be worth temporarily disabling the code.

@weiguozhi
Copy link
Contributor

Suppose we haven't protected rbp around the inline asm, then following code should be generated

#include <immintrin.h>
int f() {
    __m512i aligned{};
    asm volatile ("vmovdqa64 %%zmm0, %0" : "=m"(aligned) : "m"(aligned) : "rbp");
    return aligned[0];
}
---->
f():
        push    rbp
        mov     rbp, rsp
        and     rsp, -64
        sub     rsp, 128
....
        vmovdqa64       zmmword ptr [rsp], zmm0          // clobber rbp
        mov     rax, qword ptr [rsp]
        mov     rsp, rbp                                 // store a garbage value to rsp
        pop     rbp
        ret

Because rbp is clobbered by the inline asm, at the end of the function, we still can't restore rsp correctly.

@rnk
Copy link
Collaborator

rnk commented Nov 4, 2024

I agree, it's also a major flaw that LLVM doesn't diagnose or do anything about RBP conflicts. :)

FWIW, GCC just gives up. You can see in the godbolt example they say "bp cannot be used in 'asm' here". Maybe a reasonable compromise is that we emit a backend error for inline asm that clobbers BP/FP, but spill FP/BP around calls that use it.

Because of GCC's behavior, there is probably less real world code out there clobbering RBP, but there's probably a lot of generated vector kernels in inline asm that try to rely on the compiler to align the stack and do the fiddly bits of procedure linkage.

@mahesh-attarde
Copy link
Contributor Author

It seems like there is a general problem with inline asm blobs accessing stack variables, see this example: https://godbolt.org/z/s16MxPME3

#include <immintrin.h>
int f() {
    __m512i aligned{};
    asm volatile ("vmovdqa64 %%zmm0, %0" : "=m"(aligned) : "m"(aligned) : "rbp");
    return aligned[0];
}
---->
f():
        push    rbp
        mov     rbp, rsp
        and     rsp, -64
        sub     rsp, 128
....
        push    rbp
        push    rax
        vmovdqa64       zmmword ptr [rsp], zmm0
        add     rsp, 8
        pop     rbp
        mov     rax, qword ptr [rsp]
        mov     rsp, rbp
        pop     rbp
        ret

[rsp] here looks like it will be misaligned (16 bytes instead of 64), and it will overwrite RBP because we haven't adjusted the memory operands to account for the disturbance to RSP.

That seems like a pretty big design flaw that could break more code than it fixes.

@rnk @weiguozhi My use case has similar pattern. One point to note here is, in my case i don't have inline asm's either.
Since we agree upon problem, can we disable with PR or revert change now?

@mahesh-attarde
Copy link
Contributor Author

mahesh-attarde commented Nov 5, 2024

I opened new issue #114941

@weiguozhi
Copy link
Contributor

@mahesh-attarde the problem is even without the patch #81048, https://godbolt.org/z/s16MxPME3 still generates wrong code. Could you paste your code here?

@rnk compile time error sounds reasonable for these difficult cases.

@mahesh-attarde
Copy link
Contributor Author

@mahesh-attarde the problem is even without the patch #81048, https://godbolt.org/z/s16MxPME3 still generates wrong code. Could you paste your code here?

@rnk compile time error sounds reasonable for these difficult cases.

Here is simplified version with godbolt llc.
https://godbolt.org/z/TPMr7jcxn

bar:                                    # @bar
        push    rbp
        mov     rbp, rsp
        and     rsp, -64
        sub     rsp, 192
        movaps  xmmword ptr [rsp + 160], xmm3   # 16-byte Spill
        movaps  xmmword ptr [rsp + 144], xmm2   # 16-byte Spill
        movaps  xmmword ptr [rsp + 128], xmm1   # 16-byte Spill
>>> Problem
        movaps  xmmword ptr [rsp + 112], xmm0   # 16-byte Spill
<<<<
        movaps  xmm0, xmmword ptr [rsp]
...

For Aligned Mov Memory operand has to be 16/32/64 byte align.

@mahesh-attarde
Copy link
Contributor Author

if you are ok, can we merge this PR or revert original ?

@weiguozhi
Copy link
Contributor

Isn't [rsp + 112] 16 bytes aligned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants