Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tlb crash when running 64-bit code #177

Open
clbr opened this issue Dec 21, 2020 · 5 comments
Open

tlb crash when running 64-bit code #177

clbr opened this issue Dec 21, 2020 · 5 comments

Comments

@clbr
Copy link
Contributor

clbr commented Dec 21, 2020

When the app is 64-bit, the tlb code is buggy. It returns -1 index, which then crashes a few lines later in vr4300_dc_stage:

page_mask = vr4300->cp0.page_mask[index];
@clbr
Copy link
Contributor Author

clbr commented Dec 21, 2020

The one_hot_idx value was 18504, which mod 256 is 72, which the table maps to -1. The base i = 0.

@clbr
Copy link
Contributor Author

clbr commented Dec 21, 2020

It appears in this case there were multiple matching tlb entries. While probably undefined behavior, it should not crash cen64.
Either the table needs to be amended so that it returns the first match instead of -1, or there needs to be a "find first set bit" operation, masking out other set bits, before accessing the table.

@clbr
Copy link
Contributor Author

clbr commented Dec 21, 2020

I don't see why it's a table at all. __builtin_ffs maps to a single instruction on x86, and many other arches.

@tj90241
Copy link
Collaborator

tj90241 commented Dec 21, 2020

__builtin_ffs = bsf, etc. on x86 and that can have a host of issues - decomposing into several 10s of uops, creating false output dependencies, only running on a certain execution port, etc. It may be faster, than a memory access, it may not -- last I had measured, the backend of the host's pipeline is the bottleneck on most high-perf uarchs and I would still presume that dependency on a memory load should be better than bsf.

CLZ is also less flexible in the case of possibly multiple set bits and when the 'undefined' case needs to be handled differently (TBD).

This is definitely a whoopsie, though.

@clbr
Copy link
Contributor Author

clbr commented Dec 22, 2020

Checking Agner's tables, apparently bsf sucks on Intel. On AMD it is faster than L2 hit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants