Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicate cache alternative implementation #4292

Merged
merged 3 commits into from
Jan 29, 2025

Conversation

pmatos
Copy link
Collaborator

@pmatos pmatos commented Jan 22, 2025

Alternative implementation to #4274

Fixes #4264

@pmatos pmatos force-pushed the EnsurePredCacheReset2 branch 3 times, most recently from 7c80598 to c9155d8 Compare January 22, 2025 09:28
@pmatos pmatos force-pushed the EnsurePredCacheReset2 branch 4 times, most recently from 319f7a7 to c5ad6e1 Compare January 23, 2025 15:43
@pmatos pmatos marked this pull request as ready for review January 23, 2025 15:43
@pmatos pmatos force-pushed the EnsurePredCacheReset2 branch from c5ad6e1 to 45bb216 Compare January 23, 2025 17:31
@@ -1631,9 +1631,13 @@ DEF_OP(StoreMemPredicate) {
DEF_OP(LoadMemPredicate) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to introduce a new op here, or at least rename/fixup the description as appropriate

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right ... done. To be honest, I am thinking that the work done to enable RA of predicate registers can be reverted. I see no reason to keep code in FEX that doesn't have any users and this is this was the only user of that code.

@Sonicadvance1 what do you think about reverting the predicate register RA?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, sounds reasonable to me.

FEXCore/Source/Interface/Core/ArchHelpers/Arm64Emitter.cpp Outdated Show resolved Hide resolved

if (FillP2) {
ptrue(ARMEmitter::SubRegSize::i16Bit, PRED_X87_SVEOPT, ARMEmitter::PredicatePattern::SVE_VL5);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just move this into the above if and drop the FillP2 check

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetPreds and FillP2 are not really equivalent. We only need to fill P2 if we really need the optimization enabled. This is not true in non-SVE cabable systems or if we don't have X87 ldst instructions in the block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this prior but perhaps you missed, the current behaviour is incorrect for some fills in the dispatcher as they will be emitted without it. Setting it in the above block (which is guarded by an sve check alone) is reasonable enough

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that suggest I move it into the SetPredRegs block? I have done that. I haven't dropped the FillP2 conditional. That is necessary to only generate this when strictly necessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for the conditional right, the issue is that some helper routines are emitted as part of the dispatcher such as LUDiv. These won't be emitted with FillP2 set to true but they may still be called from a block that has it set as such and clobber the registers. For simplicities sake I think filling it unconditionally is the best solution to this, spills are rare in general outside of x87 code (which will likely have FillP2 as true anyway) so I don't see always setting it having a measurable performance impact.

Also occurred to me that some control flow patterns with multiblock could break this, so that's another reason for just always filling:
loop:
cpuid
dec r0
jz r0 end
x87 write
jmp loop
end: x87 write

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh - this is so annoying. 😔 Anyway, sorry if I am testing your patience. I think there's really no way around always filling the predicate unless we don't do the optimization. I will update the patch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now updated the patch to drop FillP2 and always fill.

@pmatos pmatos force-pushed the EnsurePredCacheReset2 branch 5 times, most recently from 89330d8 to 4242a4e Compare January 27, 2025 17:05
const auto MemReg = GetReg(Op->Addr.ID());

LOGMAN_THROW_A_FMT(HostSupportsSVE128 || HostSupportsSVE256, "LoadMemPredicate needs SVE support");
LOGMAN_THROW_A_FMT(HostSupportsSVE128 || HostSupportsSVE256, "LoadMemX87SVEOptPredicate needs SVE support");
ptrue(ARMEmitter::SubRegSize::i16Bit, Predicate, ARMEmitter::PredicatePattern::SVE_VL5);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to remove this otherwise the patch will do nothing :) (which is fine as it'll always be set in FillSRA)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh of course.

Whenever the control float leaves the block, it might clobber the
predicate register so we reset the cache whenever that happens.

Fixes FEX-Emu#4264
@pmatos pmatos force-pushed the EnsurePredCacheReset2 branch from 4242a4e to ddd241f Compare January 27, 2025 19:12
Copy link
Member

@Sonicadvance1 Sonicadvance1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, even if this PR doesn't remove the Predicate register RA.

@pmatos
Copy link
Collaborator Author

pmatos commented Jan 28, 2025

Looks fine to me, even if this PR doesn't remove the Predicate register RA.

Yeah, I will revert that separately. Lets not complicate this one further pls. :)

@Sonicadvance1 Sonicadvance1 merged commit b148cc6 into FEX-Emu:main Jan 29, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cached predicate register being clobbered by softfp lib call
3 participants