-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Predicate cache alternative implementation #4292
Conversation
7c80598
to
c9155d8
Compare
319f7a7
to
c5ad6e1
Compare
c5ad6e1
to
45bb216
Compare
@@ -1631,9 +1631,13 @@ DEF_OP(StoreMemPredicate) { | |||
DEF_OP(LoadMemPredicate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to introduce a new op here, or at least rename/fixup the description as appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right ... done. To be honest, I am thinking that the work done to enable RA of predicate registers can be reverted. I see no reason to keep code in FEX that doesn't have any users and this is this was the only user of that code.
@Sonicadvance1 what do you think about reverting the predicate register RA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, sounds reasonable to me.
|
||
if (FillP2) { | ||
ptrue(ARMEmitter::SubRegSize::i16Bit, PRED_X87_SVEOPT, ARMEmitter::PredicatePattern::SVE_VL5); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just move this into the above if and drop the FillP2 check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetPreds and FillP2 are not really equivalent. We only need to fill P2 if we really need the optimization enabled. This is not true in non-SVE cabable systems or if we don't have X87 ldst instructions in the block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this prior but perhaps you missed, the current behaviour is incorrect for some fills in the dispatcher as they will be emitted without it. Setting it in the above block (which is guarded by an sve check alone) is reasonable enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that suggest I move it into the SetPredRegs
block? I have done that. I haven't dropped the FillP2 conditional. That is necessary to only generate this when strictly necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for the conditional right, the issue is that some helper routines are emitted as part of the dispatcher such as LUDiv. These won't be emitted with FillP2 set to true but they may still be called from a block that has it set as such and clobber the registers. For simplicities sake I think filling it unconditionally is the best solution to this, spills are rare in general outside of x87 code (which will likely have FillP2 as true anyway) so I don't see always setting it having a measurable performance impact.
Also occurred to me that some control flow patterns with multiblock could break this, so that's another reason for just always filling:
loop:
cpuid
dec r0
jz r0 end
x87 write
jmp loop
end: x87 write
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argh - this is so annoying. 😔 Anyway, sorry if I am testing your patience. I think there's really no way around always filling the predicate unless we don't do the optimization. I will update the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now updated the patch to drop FillP2
and always fill.
89330d8
to
4242a4e
Compare
const auto MemReg = GetReg(Op->Addr.ID()); | ||
|
||
LOGMAN_THROW_A_FMT(HostSupportsSVE128 || HostSupportsSVE256, "LoadMemPredicate needs SVE support"); | ||
LOGMAN_THROW_A_FMT(HostSupportsSVE128 || HostSupportsSVE256, "LoadMemX87SVEOptPredicate needs SVE support"); | ||
ptrue(ARMEmitter::SubRegSize::i16Bit, Predicate, ARMEmitter::PredicatePattern::SVE_VL5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to remove this otherwise the patch will do nothing :) (which is fine as it'll always be set in FillSRA)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sigh of course.
Whenever the control float leaves the block, it might clobber the predicate register so we reset the cache whenever that happens. Fixes FEX-Emu#4264
4242a4e
to
ddd241f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me, even if this PR doesn't remove the Predicate register RA.
Yeah, I will revert that separately. Lets not complicate this one further pls. :) |
Alternative implementation to #4274
Fixes #4264