Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large-scale PowerPC recompiler rework #641

Open
wants to merge 64 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
4c16397
Latte: Fix race condition on close during game boot
Exzap Nov 4, 2022
f523b21
PPCRec: Use vector for segment list + deduplicate RA file
Exzap Nov 4, 2022
0265108
PPCRec: Use vector for instruction list
Exzap Nov 4, 2022
b1b46f3
PPCRec: Move Segment and Instruction struct into separate files
Exzap Nov 4, 2022
5b2bc7e
PPCRec: Rename IML structs for better clarity
Exzap Nov 5, 2022
625874a
PPCRec: Move debug printing + smaller clean up
Exzap Nov 5, 2022
101a2ef
PPCRec: Move analyzer file + move some funcs to IMLInstruction
Exzap Nov 5, 2022
e53c6ad
PPCRec: Move IML optimizer file
Exzap Nov 5, 2022
d1fe1a9
PPCRec: Move IML register allocator
Exzap Nov 6, 2022
27f70d5
PPCRec: Emit x86 movd for non-AVX + more restructuring
Exzap Nov 7, 2022
db60ea6
PPCRec: Move X64 files into subdirectory and rename
Exzap Nov 7, 2022
ce8dc55
PPCRec: Reworked IML builder to work with basic-blocks
Exzap Dec 12, 2022
a5f6faa
PPCRec: Fix merge conflicts
Exzap Dec 12, 2022
8d972d2
PPCRec: Unify BCCTR and BCLR code
Exzap Dec 12, 2022
874e376
PPCRec: Fix single segment loop not being detected
Exzap Dec 12, 2022
93f5615
PPCRec: Remove now unused PPC_ENTER and jumpMarkAddress
Exzap Dec 12, 2022
9dc8207
PPCRec: Clean up unused flags
Exzap Dec 12, 2022
d308252
PPCRec: Make LSWI/STWSI more generic + GPR temporaries storage
Exzap Dec 13, 2022
832b761
PPCRec: Make register pool for RA configurable
Exzap Dec 13, 2022
53139cd
PPCRec: Rename register constants to avoid name collision
Exzap Dec 14, 2022
ac22a38
PPCRec: New x86-64 code emitter
Exzap Dec 17, 2022
91f9727
PPCRec: New compare and cond jump instrs, update RA
Exzap Dec 17, 2022
2535cf4
PPCRec: Streamline instructions + unify code for CR updates
Exzap Dec 18, 2022
8df0281
PPCRec: Further unify CR code
Exzap Dec 19, 2022
37256ac
PPCRec: Rework carry bit and generalize carry IML instructions
Exzap Dec 27, 2022
ff09940
PPCRec: Avoid complex optimizations in backend
Exzap Dec 28, 2022
c4b9fff
PPCRec: Rework CR bit handling
Exzap Jan 2, 2023
a1c8f6f
PPCRec: Refactoring and clean up
Exzap Jan 3, 2023
b4f2e02
PPCRec: Refactor load/store instructions
Exzap Jan 3, 2023
3ba9460
PPCRec: Use IMLReg in more places, unify and simplify var names
Exzap Jan 5, 2023
e86fa57
PPCRec: Simplify PPC and IML logic instructions
Exzap Jan 5, 2023
b367689
PPCRec: Unify code + misc RA preparation
Exzap Jan 30, 2023
0577eff
PPCRec: Use IMLReg type in FPR RA
Exzap Jan 30, 2023
59bd84b
PPCRec: Use agnostic breakpoints
Exzap Jan 30, 2023
154aef0
PPCRec: Fix capitalization in include
Exzap Jan 30, 2023
df74b99
PPCRec: Initial support for typed registers
Exzap Feb 2, 2023
7c76738
PPCRec: Partial support for typed registers in RA
Exzap Feb 4, 2023
b1c6646
PPCRec: Further work on support for typed registers in RA
Exzap Feb 5, 2023
b4f2f91
PPCRec: FPRs now use the shared register allocator
Exzap Feb 6, 2023
e5717fb
PPCRec: Implement MFCR and MTCRF
Exzap Mar 13, 2023
b685a08
Fix compile errors due to rebase
Exzap Dec 13, 2023
cc730b4
PPCRec: Dead code elimination + reintroduce pre-rework optimizations
Exzap Jan 13, 2024
450c0a5
PPCRec: Simplify RA code and clean it up a bit
Exzap Sep 1, 2024
dcbaa5a
PPCRec: Add RA support for instructions with register constraints
Exzap Oct 17, 2024
a0ad48c
PPCRec: Some fixes
Exzap Oct 19, 2024
8614150
PPCRec: Support for arbitrary function calls in the IR
Exzap Oct 19, 2024
97ef952
PPCRec: Added dump option for recompiled functions + more fixes
Exzap Oct 19, 2024
aa904b6
PPCRec: Clean up code and optimize
Exzap Oct 19, 2024
002a03d
PPCRec: Implement MCRF, rework DCBZ
Exzap Oct 20, 2024
608757d
PPCRec: Fixes and optimizations + rework FRES/FRSQRTE
Exzap Oct 23, 2024
f1fa494
Add natvis file for boost::container::small_vector
Exzap Oct 23, 2024
e34a273
PPCRec: Optimize register allocation
Exzap Oct 23, 2024
5949e62
PPCRec: Reintroduce optimization for BDNZ loops
Exzap Oct 25, 2024
70c99fd
PPCRec: Use 32bit mov for 32bit operations
Exzap Oct 25, 2024
96d7c75
PPCRec: Update spill cost calculation
Exzap Oct 25, 2024
636b63f
PPCRec: Refactor read/write access tracking for liveness ranges
Exzap Oct 26, 2024
126a682
PPCRec: Clean up some outdated code
Exzap Oct 26, 2024
f309d5d
PPCRec: Code cleanup
Exzap Oct 27, 2024
099d1d4
PPCRec: Rework RLWIMI
Exzap Oct 28, 2024
e332726
PPCRec: Optimizations
Exzap Oct 28, 2024
a05b655
PPCRec: Handle edge case for x86 shift instructions
Exzap Oct 29, 2024
83569ae
PPCRec: Avoid relying on undefined behavior in std::copy_backwards
Exzap Oct 30, 2024
8219a5f
PPCRec: Fix stack pointer alignment for calls
Exzap Oct 30, 2024
9187044
PPCRec: Use named register constants instead of hardcoding regs
Exzap Oct 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions boost.natvis
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<?xml version='1.0' encoding='utf-8'?>
<AutoVisualizer xmlns="http://schemas.microsoft.com/vstudio/debugger/natvis/2010">

<Type Name="boost::container::small_vector&lt;*&gt;">
<Expand>
<Item Name="[size]">m_holder.m_size</Item>
<ArrayItems>
<Size>m_holder.m_size</Size>
<ValuePointer>m_holder.m_start</ValuePointer>
</ArrayItems>
</Expand>
</Type>

<Type Name="boost::container::static_vector&lt;*&gt;">
<DisplayString>{{ size={m_holder.m_size} }}</DisplayString>
<Expand>
<Item Name="[size]" ExcludeView="simple">m_holder.m_size</Item>
<Item Name="[capacity]" ExcludeView="simple">static_capacity</Item>
<ArrayItems>
<Size>m_holder.m_size</Size>
<ValuePointer>($T1*)m_holder.storage.data</ValuePointer>
</ArrayItems>
</Expand>
</Type>

</AutoVisualizer>
35 changes: 21 additions & 14 deletions src/Cafe/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -67,24 +67,31 @@ add_library(CemuCafe
HW/Espresso/Recompiler/PPCFunctionBoundaryTracker.h
HW/Espresso/Recompiler/PPCRecompiler.cpp
HW/Espresso/Recompiler/PPCRecompiler.h
HW/Espresso/Recompiler/PPCRecompilerImlAnalyzer.cpp
HW/Espresso/Recompiler/IML/IML.h
HW/Espresso/Recompiler/IML/IMLSegment.cpp
HW/Espresso/Recompiler/IML/IMLSegment.h
HW/Espresso/Recompiler/IML/IMLInstruction.cpp
HW/Espresso/Recompiler/IML/IMLInstruction.h
HW/Espresso/Recompiler/IML/IMLDebug.cpp
HW/Espresso/Recompiler/IML/IMLAnalyzer.cpp
HW/Espresso/Recompiler/IML/IMLOptimizer.cpp
HW/Espresso/Recompiler/IML/IMLRegisterAllocator.cpp
HW/Espresso/Recompiler/IML/IMLRegisterAllocator.h
HW/Espresso/Recompiler/IML/IMLRegisterAllocatorRanges.cpp
HW/Espresso/Recompiler/IML/IMLRegisterAllocatorRanges.h
HW/Espresso/Recompiler/PPCRecompilerImlGen.cpp
HW/Espresso/Recompiler/PPCRecompilerImlGenFPU.cpp
HW/Espresso/Recompiler/PPCRecompilerIml.h
HW/Espresso/Recompiler/PPCRecompilerImlOptimizer.cpp
HW/Espresso/Recompiler/PPCRecompilerImlRanges.cpp
HW/Espresso/Recompiler/PPCRecompilerImlRanges.h
HW/Espresso/Recompiler/PPCRecompilerImlRegisterAllocator2.cpp
HW/Espresso/Recompiler/PPCRecompilerImlRegisterAllocator.cpp
HW/Espresso/Recompiler/PPCRecompilerIntermediate.cpp
HW/Espresso/Recompiler/PPCRecompilerX64AVX.cpp
HW/Espresso/Recompiler/PPCRecompilerX64BMI.cpp
HW/Espresso/Recompiler/PPCRecompilerX64.cpp
HW/Espresso/Recompiler/PPCRecompilerX64FPU.cpp
HW/Espresso/Recompiler/PPCRecompilerX64Gen.cpp
HW/Espresso/Recompiler/PPCRecompilerX64GenFPU.cpp
HW/Espresso/Recompiler/PPCRecompilerX64.h
HW/Espresso/Recompiler/x64Emit.hpp
HW/Espresso/Recompiler/BackendX64/BackendX64AVX.cpp
HW/Espresso/Recompiler/BackendX64/BackendX64BMI.cpp
HW/Espresso/Recompiler/BackendX64/BackendX64.cpp
HW/Espresso/Recompiler/BackendX64/BackendX64FPU.cpp
HW/Espresso/Recompiler/BackendX64/BackendX64Gen.cpp
HW/Espresso/Recompiler/BackendX64/BackendX64GenFPU.cpp
HW/Espresso/Recompiler/BackendX64/BackendX64.h
HW/Espresso/Recompiler/BackendX64/X64Emit.hpp
HW/Espresso/Recompiler/BackendX64/x86Emitter.h
HW/Latte/Common/RegisterSerializer.cpp
HW/Latte/Common/RegisterSerializer.h
HW/Latte/Common/ShaderSerializer.cpp
Expand Down
20 changes: 11 additions & 9 deletions src/Cafe/HW/Espresso/EspressoISA.h
Original file line number Diff line number Diff line change
Expand Up @@ -91,13 +91,15 @@ namespace Espresso
BCCTR = 528
};

enum class OPCODE_31
enum class Opcode31
{

TW = 4,
MFTB = 371,
};

inline PrimaryOpcode GetPrimaryOpcode(uint32 opcode) { return (PrimaryOpcode)(opcode >> 26); };
inline Opcode19 GetGroup19Opcode(uint32 opcode) { return (Opcode19)((opcode >> 1) & 0x3FF); };
inline Opcode31 GetGroup31Opcode(uint32 opcode) { return (Opcode31)((opcode >> 1) & 0x3FF); };

struct BOField
{
Expand Down Expand Up @@ -132,6 +134,12 @@ namespace Espresso
uint8 bo;
};

// returns true if LK bit is set, only valid for branch instructions
inline bool DecodeLK(uint32 opcode)
{
return (opcode & 1) != 0;
}

inline void _decodeForm_I(uint32 opcode, uint32& LI, bool& AA, bool& LK)
{
LI = opcode & 0x3fffffc;
Expand Down Expand Up @@ -183,13 +191,7 @@ namespace Espresso
_decodeForm_D_branch(opcode, BD, BO, BI, AA, LK);
}

inline void decodeOp_BCLR(uint32 opcode, BOField& BO, uint32& BI, bool& LK)
{
// form XL (with BD field expected to be zero)
_decodeForm_XL(opcode, BO, BI, LK);
}

inline void decodeOp_BCCTR(uint32 opcode, BOField& BO, uint32& BI, bool& LK)
inline void decodeOp_BCSPR(uint32 opcode, BOField& BO, uint32& BI, bool& LK) // BCLR and BCSPR
{
// form XL (with BD field expected to be zero)
_decodeForm_XL(opcode, BO, BI, LK);
Expand Down
20 changes: 8 additions & 12 deletions src/Cafe/HW/Espresso/Interpreter/PPCInterpreterALU.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ static void PPCInterpreter_setXerOV(PPCInterpreter_t* hCPU, bool hasOverflow)
{
if (hasOverflow)
{
hCPU->spr.XER |= XER_SO;
hCPU->spr.XER |= XER_OV;
hCPU->xer_so = 1;
hCPU->xer_ov = 1;
}
else
{
hCPU->spr.XER &= ~XER_OV;
hCPU->xer_ov = 0;
}
}

Expand Down Expand Up @@ -246,7 +246,7 @@ static void PPCInterpreter_SUBFCO(PPCInterpreter_t* hCPU, uint32 opcode)
uint32 a = hCPU->gpr[rA];
uint32 b = hCPU->gpr[rB];
hCPU->gpr[rD] = ~a + b + 1;
// update xer
// update carry
if (ppc_carry_3(~a, b, 1))
hCPU->xer_ca = 1;
else
Expand Down Expand Up @@ -848,8 +848,7 @@ static void PPCInterpreter_CMP(PPCInterpreter_t* hCPU, uint32 opcode)
hCPU->cr[cr * 4 + CR_BIT_GT] = 1;
else
hCPU->cr[cr * 4 + CR_BIT_EQ] = 1;
if ((hCPU->spr.XER & XER_SO) != 0)
hCPU->cr[cr * 4 + CR_BIT_SO] = 1;
hCPU->cr[cr * 4 + CR_BIT_SO] = hCPU->xer_so;
PPCInterpreter_nextInstruction(hCPU);
}

Expand All @@ -871,8 +870,7 @@ static void PPCInterpreter_CMPL(PPCInterpreter_t* hCPU, uint32 opcode)
hCPU->cr[cr * 4 + CR_BIT_GT] = 1;
else
hCPU->cr[cr * 4 + CR_BIT_EQ] = 1;
if ((hCPU->spr.XER & XER_SO) != 0)
hCPU->cr[cr * 4 + CR_BIT_SO] = 1;
hCPU->cr[cr * 4 + CR_BIT_SO] = hCPU->xer_so;
PPCInterpreter_nextInstruction(hCPU);
}

Expand All @@ -895,8 +893,7 @@ static void PPCInterpreter_CMPI(PPCInterpreter_t* hCPU, uint32 opcode)
hCPU->cr[cr * 4 + CR_BIT_GT] = 1;
else
hCPU->cr[cr * 4 + CR_BIT_EQ] = 1;
if (hCPU->spr.XER & XER_SO)
hCPU->cr[cr * 4 + CR_BIT_SO] = 1;
hCPU->cr[cr * 4 + CR_BIT_SO] = hCPU->xer_so;
PPCInterpreter_nextInstruction(hCPU);
}

Expand All @@ -919,8 +916,7 @@ static void PPCInterpreter_CMPLI(PPCInterpreter_t* hCPU, uint32 opcode)
hCPU->cr[cr * 4 + CR_BIT_GT] = 1;
else
hCPU->cr[cr * 4 + CR_BIT_EQ] = 1;
if (hCPU->spr.XER & XER_SO)
hCPU->cr[cr * 4 + CR_BIT_SO] = 1;
hCPU->cr[cr * 4 + CR_BIT_SO] = hCPU->xer_so;
PPCInterpreter_nextInstruction(hCPU);
}

4 changes: 2 additions & 2 deletions src/Cafe/HW/Espresso/Interpreter/PPCInterpreterFPU.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ espresso_frsqrte_entry_t frsqrteLookupTable[32] =
{0x20c1000, 0x35e},{0x1f12000, 0x332},{0x1d79000, 0x30a},{0x1bf4000, 0x2e6},
};

double frsqrte_espresso(double input)
ATTR_MS_ABI double frsqrte_espresso(double input)
{
unsigned long long x = *(unsigned long long*)&input;

Expand Down Expand Up @@ -111,7 +111,7 @@ espresso_fres_entry_t fresLookupTable[32] =
{0x88400, 0x11a}, {0x65000, 0x11a}, {0x41c00, 0x108}, {0x20c00, 0x106}
};

double fres_espresso(double input)
ATTR_MS_ABI double fres_espresso(double input)
{
// based on testing we know that fres uses only the first 15 bits of the mantissa
// seee eeee eeee mmmm mmmm mmmm mmmx xxxx .... (s = sign, e = exponent, m = mantissa, x = not used)
Expand Down
11 changes: 6 additions & 5 deletions src/Cafe/HW/Espresso/Interpreter/PPCInterpreterInternal.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,9 @@
#define CR_BIT_EQ 2
#define CR_BIT_SO 3

#define XER_SO (1<<31) // summary overflow bit
#define XER_OV (1<<30) // overflow bit
#define XER_BIT_CA (29) // carry bit index. To accelerate frequent access, this bit is stored as a separate uint8
#define XER_BIT_SO (31) // summary overflow, counterpart to CR SO
#define XER_BIT_OV (30)

// FPSCR
#define FPSCR_VXSNAN (1<<24)
Expand Down Expand Up @@ -118,7 +118,8 @@

static inline void ppc_update_cr0(PPCInterpreter_t* hCPU, uint32 r)
{
hCPU->cr[CR_BIT_SO] = (hCPU->spr.XER&XER_SO) ? 1 : 0;
cemu_assert_debug(hCPU->xer_so <= 1);
hCPU->cr[CR_BIT_SO] = hCPU->xer_so;
hCPU->cr[CR_BIT_LT] = ((r != 0) ? 1 : 0) & ((r & 0x80000000) ? 1 : 0);
hCPU->cr[CR_BIT_EQ] = (r == 0);
hCPU->cr[CR_BIT_GT] = hCPU->cr[CR_BIT_EQ] ^ hCPU->cr[CR_BIT_LT] ^ 1; // this works because EQ and LT can never be set at the same time. So the only case where GT becomes 1 is when LT=0 and EQ=0
Expand Down Expand Up @@ -190,8 +191,8 @@ inline double roundTo25BitAccuracy(double d)
return *(double*)&v;
}

double fres_espresso(double input);
double frsqrte_espresso(double input);
ATTR_MS_ABI double fres_espresso(double input);
ATTR_MS_ABI double frsqrte_espresso(double input);

void fcmpu_espresso(PPCInterpreter_t* hCPU, int crfD, double a, double b);

Expand Down
3 changes: 2 additions & 1 deletion src/Cafe/HW/Espresso/Interpreter/PPCInterpreterLoadStore.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,8 @@ static void PPCInterpreter_STWCX(PPCInterpreter_t* hCPU, uint32 Opcode)
ppc_setCRBit(hCPU, CR_BIT_GT, 0);
ppc_setCRBit(hCPU, CR_BIT_EQ, 1);
}
ppc_setCRBit(hCPU, CR_BIT_SO, (hCPU->spr.XER&XER_SO) != 0 ? 1 : 0);
cemu_assert_debug(hCPU->xer_so <= 1);
ppc_setCRBit(hCPU, CR_BIT_SO, hCPU->xer_so);
// remove reservation
hCPU->reservedMemAddr = 0;
hCPU->reservedMemValue = 0;
Expand Down
16 changes: 12 additions & 4 deletions src/Cafe/HW/Espresso/Interpreter/PPCInterpreterMain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -63,16 +63,24 @@ void PPCInterpreter_setDEC(PPCInterpreter_t* hCPU, uint32 newValue)
uint32 PPCInterpreter_getXER(PPCInterpreter_t* hCPU)
{
uint32 xerValue = hCPU->spr.XER;
xerValue &= ~(1<<XER_BIT_CA);
if( hCPU->xer_ca )
xerValue |= (1<<XER_BIT_CA);
xerValue &= ~(1 << XER_BIT_CA);
xerValue &= ~(1 << XER_BIT_SO);
xerValue &= ~(1 << XER_BIT_OV);
if (hCPU->xer_ca)
xerValue |= (1 << XER_BIT_CA);
if (hCPU->xer_so)
xerValue |= (1 << XER_BIT_SO);
if (hCPU->xer_ov)
xerValue |= (1 << XER_BIT_OV);
return xerValue;
}

void PPCInterpreter_setXER(PPCInterpreter_t* hCPU, uint32 v)
{
hCPU->spr.XER = v;
hCPU->xer_ca = (v>>XER_BIT_CA)&1;
hCPU->xer_ca = (v >> XER_BIT_CA) & 1;
hCPU->xer_so = (v >> XER_BIT_SO) & 1;
hCPU->xer_ov = (v >> XER_BIT_OV) & 1;
}

uint32 PPCInterpreter_getCoreIndex(PPCInterpreter_t* hCPU)
Expand Down
1 change: 0 additions & 1 deletion src/Cafe/HW/Espresso/Interpreter/PPCInterpreterOPC.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
#include "Cafe/OS/libs/coreinit/coreinit_CodeGen.h"

#include "../Recompiler/PPCRecompiler.h"
#include "../Recompiler/PPCRecompilerX64.h"

#include <float.h>
#include "Cafe/HW/Latte/Core/LatteBufferCache.h"
Expand Down
5 changes: 4 additions & 1 deletion src/Cafe/HW/Espresso/PPCState.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ struct PPCInterpreter_t
uint32 fpscr;
uint8 cr[32]; // 0 -> bit not set, 1 -> bit set (upper 7 bits of each byte must always be zero) (cr0 starts at index 0, cr1 at index 4 ..)
uint8 xer_ca; // carry from xer
uint8 xer_so;
uint8 xer_ov;
uint8 LSQE;
uint8 PSE;
// thread remaining cycles
Expand All @@ -67,7 +69,8 @@ struct PPCInterpreter_t
uint32 reservedMemValue;
// temporary storage for recompiler
FPR_t temporaryFPR[8];
uint32 temporaryGPR[4];
uint32 temporaryGPR[4]; // deprecated, refactor backend dependency on this away
uint32 temporaryGPR_reg[4];
// values below this are not used by Cafe OS usermode
struct
{
Expand Down
Loading
Loading