-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constant limit issues with r500 hardware #140
Comments
Inlining the relative addressing offset works fine. For example converting :
into
This will work OK and I think it would be a good start. TGSI supports this syntax, so this would preferably happen already at the nine level. But if not I can add some heuristic to the driver. So I think this we should start with, than we should take a look what fixup immediates we really need for R500 and after that I can take care of anything that remains at the driver level. |
Hm, but actually the integer addressing offsets are already properly inlined as they should be (just one random shader from Anno 1404)
|
So the immediates are mostly what the shader declares, except the -256, 1.5 and 2.5 that we use for the A0 fixups I assume. |
BTW I have mostly no clue about D3D, how is is is supposed to work? Are the immediates supposed to be the last constants, i.e., if the shader defines immediates c221 and c222, can we assume that nothing after c222 will be accessed by the relative addressing? |
Yes exactly for -256, 1.5 and 2.5. We use them to handle out of bounds read of a0 (we have bound checks). But I would guess that r500 would natively already handle them the d3d9 way, thus we could disable that code for it. The shader has 256 constants, and the value of those are defined via API (you set them manually, you don't bind a buffer). So basically when we see the use of a0, we upload all 256 constants (we cannot guess where a0 will stop reading), and c221 and c222 will overwrite the constants set by the user there. |
Regarding the a0 bound check, the hardware docs say: Regarding the constants, it seems that indeed when relative addressing is present, the only way how we could solve it is to not tell the driver that something is an immediate and just upload it with the others constant. One possible optimization I see would be to just upload 0 and 1 as an immediates. For example in the shader above: |
Yes that's exactly what d3d9 expects. As for the immediates, why not. First step is not even test for 0 or 1. |
Alright, here is a branch with the use of constants instead of immediates when relative addressing is used: |
I'm not sure if its working, I still see the constants from the shaders as an immediates. Here are dumped shaders ( . |
the 'use constants instead of immediates' is only activated when relative addressing is used. |
It looks like it doesn't happen also for relative adressing, for example from the previously attached log:
|
The message "Recompiling shader for constant compact" seems to indicate relative addressing was not detected. Could you investigate this ? Find which use of ureg_src_indirect is related to that ADDR usage, and should set indirect_const_access to TRUE ? |
I think that DBG Recompiling shader for constant compact is misleading because looking at the code is is actually in both in the |
When setting |
Just one example
|
You are right about the mistake. https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/sincos---ps |
Regarding the point:
How would be something like Furthermore this would theoretically allow easily to introduce at some point in the future a corresponding |
@ondracka What's your view with the patch proposal ? |
I would keep the status quo for now, it looks like there are more issues at the r300 driver side (and not just related to the constants limit). Your patches are definitely step in the right direction and something we want in the long run, but the driver is not ready, so IMO it doesn't make sense to push it now. I really appreciate the help, but unfortunately now I have to fix the driver to be able to benefit from it. Also BTW what do you use for debugging D3D apps under wine/nine? I wasn't able to find anything convenient so far. When it comes to normal debbugers, winedbg is a huge PITA, I'm for example not able to set a breakpoint in a mesa code... (using gdb as a proxy sometimes works, but also crashes a lot). So at some point I gave up and now I'm back at the good old put printfs everywhere approach, that isn't especially effective though... for OpenGL I'l quite fond of apitrace, but the windows apitrace under wine keeps crashing for me and when it works it has much less features compared to the OpenGL one. |
I would like to revisit this, now that https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19277 is in and https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19618 and https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18933 are ready to merge, we should be in much better shape on the r300 driver side. However, it looks like the https://gitlab.freedesktop.org/axeldavy/mesa/-/commits/r500_constant_number branch doesn't really help. The shaders look OK, there are no miscompilations however the rendering looks mostly the same as without the branch. Can I check somewhere in the nine logs what constants are actually being uploaded? |
This is to continue discussion about how to make nine and r300 work with D3D games (copying also the relevant discussion from the thread #133).
axeldavy :
I've some code that should fix r500's support of 256 constants (Last 3 commits of https://github.com/iXit/Mesa-3D/commits/master) in case you want to test.
ondracka:
I did a quick testing and now we fail to compile pretty much any vertex shader using relative addressing. The issue with r500 hardware is that not only constants but also immediates (I'm using the TGSI nomenclature here, as I don't know much about DX) must fit into the 256 constants limit. And what sucks the most is that for vertex shaders we don't have any inlining options (we can inline 1, 0, -1 using the constant swizzles and thats it). To be honest I have no clue how this is supposed to work (how this works on Windows). I'd be happy to implement/enhance stuff on the driver side, but I really have no idea. As I see it, there are two options:
crazy idea could be to potentially construct some of the constants (at least the easy ones like 2, 0.5, etc..) in the shaders, but that would have obvious code size and speed disadvantages.
BTW regarding the NPOT support discussed here, r500 emulates ALL of NPOT support in shaders (we don't have even partial support, we support only rectangular textures), so if there is something not working, open a bug at fdo and I'll take a look if I can enhance the shader emulation.
axeldavy:
@ondracka Is there an option to inline the offset for relative addressing ?
The difference indeed with this series is that when relative addressing is used, all 256 constants will be declared as used, thus leaving no space for the immediates.
Something that nine does is that it reads the immediate if it has them, and write the immediate value into the constant buffer for it to work with relative addressing. We could instead always read the constant buffer, not the immediate, for r500. The question is whether that will be enough for the few immediate that the d3d asm -> TGSI translation generates.
axeldavy:
I think if you can work with constructing the few immediates that are generated by the asm conversion (-1, 0, 1, 0.5, 2., If I checked correctly), and the integer offset of relative addressing, then everything could fit.
EDIT: Actually this is more complicated than that. Many of these constants are not in paths that can trigger when we use relative addressing. However there are a few immediates that are used to enforce d3d9 behaviour for a few things that I bet r500 doesn't need. For example the a0 clamping eats a few immediates, as well as the pointsize clamping. We'll need to continue this discussion elsewhere.
The text was updated successfully, but these errors were encountered: