-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tlb crash when running 64-bit code #177
Comments
The one_hot_idx value was 18504, which mod 256 is 72, which the table maps to -1. The base i = 0. |
It appears in this case there were multiple matching tlb entries. While probably undefined behavior, it should not crash cen64. |
I don't see why it's a table at all. __builtin_ffs maps to a single instruction on x86, and many other arches. |
__builtin_ffs = bsf, etc. on x86 and that can have a host of issues - decomposing into several 10s of uops, creating false output dependencies, only running on a certain execution port, etc. It may be faster, than a memory access, it may not -- last I had measured, the backend of the host's pipeline is the bottleneck on most high-perf uarchs and I would still presume that dependency on a memory load should be better than bsf. CLZ is also less flexible in the case of possibly multiple set bits and when the 'undefined' case needs to be handled differently (TBD). This is definitely a whoopsie, though. |
Checking Agner's tables, apparently bsf sucks on Intel. On AMD it is faster than L2 hit. |
When the app is 64-bit, the tlb code is buggy. It returns -1 index, which then crashes a few lines later in vr4300_dc_stage:
The text was updated successfully, but these errors were encountered: