Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug printf reports "Mixed stage shader module not supported" #4350

Closed
natevm opened this issue Jul 25, 2022 · 23 comments · Fixed by #7062
Closed

Debug printf reports "Mixed stage shader module not supported" #4350

natevm opened this issue Jul 25, 2022 · 23 comments · Fixed by #7062
Labels
RT Ray Tracing ShaderVal Shader Validation (SPIR-V related)

Comments

@natevm
Copy link

natevm commented Jul 25, 2022

Describe the Issue

I am trying to compile a collection of ray tracing entry points all from one .hlsl file into one .spv file, which I currently can do through the DXC compiler. I would also like to use "printf" for debugging. If I could do this, I could very closely mimic our current CUDA/OptiX workflow, but with Vulkan, and in a much more device-agnostic setup.

Unfortunately, when I try to create a shader module from this .spv while also using the GPU debug printf extension, I get the following error from vkCreateShaderModule: "Mixed stage shader module not supported"

At the moment, I'm therefore forced to either tell users to split every entry point into a separate .hlsl file---adding a tremendous amount of book keeping---or alternatively I must tell my users that we cannot use printf in any shaders due to the above error message.

Valid Usage ID
ERROR: [-1804403757][UNASSIGNED-Debug-Printf] : Validation Error: [ UNASSIGNED-Debug-Printf ] Object 0: handle = 0x2be7f7e7690, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x9472fbd3 | Error during shader instrumentation: line 0: Mixed stage shader module not supported

Environment:

  • OS: Windows 11
  • GPU: RTX 8000 / AMD Radeon RX 6750 XT
  • SDK or header version if building from repo: 1.3.216.0
  • Options enabled (synchronization, best practices, etc.):

Additional context

Here's a little shader that I'm trying to get working:
deviceCode.hlsl.txt

and here's the SPIR-V generated by dxc:
spirv.txt

I'm using the following DXC options:
-spirv
-fspv-target-env=vulkan1.1spirv1.4
-T lib_6_3
-fspv-extension=SPV_KHR_ray_tracing
-fspv-extension=SPV_KHR_ray_query
-fspv-extension=SPV_KHR_non_semantic_info

@greg-lunarg
Copy link
Contributor

I am fairly certain this restriction can be lifted fairly easily. I think we only put this restriction in because we weren't certain this capability (multiple stages in a single module) would be used and so didn't want to spend the effort making sure it worked.

I will take a look and report back.

@ncesario-lunarg
Copy link
Contributor

Aside from debug printf, please keep in mind that there are currently known holes in general shader validation when it comes to multiple entry points, particularly with respect to push constants and transform feedback. See #2450 for some more details.

@ncesario-lunarg ncesario-lunarg added the ShaderVal Shader Validation (SPIR-V related) label Jul 25, 2022
@natevm
Copy link
Author

natevm commented Jul 25, 2022

That's good to hear, and thanks for the heads up about push constants and transform feedback.

For what it's worth, it seems I can use a regex to remove all lines starting with "OpEntryPoint" that aren't used by the current shader module, and then use the assembler in SPIRV tools to get a program that works---or at least, the validation layers don't report the above error anymore.

It would be nicer to not have to modify my assembly on the fly and re-assemble per-shader module though. :)

@greg-lunarg
Copy link
Contributor

OK. After reviewing the code, I now remember the reason for this restriction.

As I said earlier, multi-stage modules can be supported, but they will require additional handling/logic.

The issue is that, for the general instrumentation support which printf is built on, there is stage-specific info that is written out.

So if there are any functions which are shared between call trees of two entry points for different kinds of shaders, those functions need to be cloned, one copy for each different stage it is involved in.

Of course, if there is no shared functions between entry points of different stages in a module, the restriction does not need to be enforced for that module. We would just need to add a little logic to verify no functions are shared between entry points with different stage types. We would also need to add a little additional logic which can handle a different stage type for each entry point.

This would be a good next step expansion of this capability as inlining is a common offline optimization for SPIR-V which happens to eliminate any possibility of shared functions.

Would your use case be able to take advantage of this additional checking?

If not, the next step would be to do the cloning. It would be fairly mechanical.

Another approach is to observe that the printf mechanism should not really require stage-specific data to be written, and so plumb in logic into the general instrumentation mechanism to inhibit this stage-specific data to be written, if it isn't done already. This approach would bear a little more study if it is being considered.

@natevm
Copy link
Author

natevm commented Jul 25, 2022

One of the motivating reasons we have for including multiple entry points within one program is that entry points can share common functions. For example, we might use the same quad software intersection test in a ray tracing intersection program as we do in a ray generation program for handling analytical light sources, and we'll share more general shading code is across closest hit programs for different geometry types. This keeps the code "DRY", and helps eliminate bugs that can arise from not updating all instances of the same function across otherwise separated programs.

If inlining functions solves this issue, that would be fine with me. In OptiX, we're used to having to inline functions for other reasons, so that would be a familiar requirement.

I personally lean towards your final observation long term---I'm not entirely sure why printf requires stage-specific data to be written out, though I might be missing some subtleties here with SPV.

@ncesario-lunarg
Copy link
Contributor

If avoiding code duplication is the main goal, could using #include directives (I think DXC supports this?) be a short term solution (e.g., putting your intersection and generation programs in separate files, but including a "common.hlsl" file)?

@natevm
Copy link
Author

natevm commented Jul 26, 2022

A common HLSL file would help yes, but code de-duplication isn’t the main goal.

The real goal is to replicate our current OptiX workflow as much as possible, where nvcc compiles all ray tracing shader entry points into one single PTX output, and printf’s can be used throughout the device code. I’m close to having this work by substituting nvcc for dxc and using the debug printf extension. This issue on multiple entry points disabling printf is the main hang up to achieving this goal.

@natevm
Copy link
Author

natevm commented Jul 26, 2022

Quick update, unfortunately it seems my workaround of removing unnecessary OpEntryPoint instructions doesn’t work. My modified SPIRV passes validation with SPIRV Tools, but then attempting to create the shader module from the modified IR causes an access violation somewhere in the driver.

So I’m a bit back to square one… If we can get this issue fixed that would be really helpful. At the moment though I’m forced to settle for no debug printf, unless I can figure out how to successfully modify the SPIRV to remove entry points on the fly. 😞

Edit: actually I might have figured out how to patch the SPIRV again such that it's accepted by Vulkan without crashing... it seems like in addition to the "OpEntryPoint" declaration lines, I also need to remove the entrypoint functions themselves, as well as the "OpName"s that seem to declare the functions.

Edit 2: After regexing out the other entry points when building my individual ray tracing shader modules, I can confirm that the printf's work now. So the workaround here does work. So, until this is fixed, just for the sake of visibility, as a workaround you can :

  1. use dxc.exe with -T lib_6_3 to compile multiple hlsl entry points into one common .spv file.
  2. for each shader module to be made, use SPIR-V tools' spvBinaryToText to get the human readable spirv. Copy this for editing...
  3. use something like std::regex re("( *)(OpEntryPoint )(.*? )([%][A-Za-z]*)( \"[A-Za-z]*\" )(.*)"); to pick up all entry points, and identify the entry points that do not match VkPipelineShaderStageCreateInfo shaderStage.pName for the current shader stage being made.
  4. Remove the OpEntryPoint declaration line for all unneeded entry points for the given shader stage
  5. use something like std::regex OpNameRE("( *)(OpName )(" + otherEntryPoint + ")( \"[A-Za-z]*\")"); to identify and remove the OpName declaration line for each to-be-removed entry point function
  6. use something like std::regex OpEntryPointRE("( *)(" + otherEntryPoint + " =[^]*?OpFunctionEnd)"); to identify and remove the unneeded entry point definition
  7. finally, use SPIR-V tools' spvTextToBinary to go back to binary SPIRV that can be accepted by Vulkan.

That seems to work for me. Ideally I wouldn't have to do that, and I worry about robustness, but it does work.

@greg-lunarg
Copy link
Contributor

I am glad you figured out a workaround. But there appears to be a need for the printf capability to be enhanced in terms of shaders that are supported.

From your reply, it appears that we should at least check for and handle the case where there is no shared functions between entry points. This condition can be accomplished with spirv-opt inlining.

The other option is to create a stage-agnostic mode of instrumentation support and utilize that with printf.

@natevm
Copy link
Author

natevm commented Jul 28, 2022

Agreed. I would gladly test this change too. spirv-opt inlining is completely fine with me. It seems like that wouldn't be too much work, though I'm unfamiliar with what's happening under the hood.

Long term, I suspect the stage-agnostic approach will scale better.

@greg-lunarg
Copy link
Contributor

Looking more at the stage-agnostic solution, I believe this is not a good solution as I believe there is a good reason to keep the stage specificity.

Since we possibly/likely are generating multiple printf statements, the printf instrumentation adds information to each message stating the stage and the instance of the stage that generated that message.

This was initially something that was done as part of support for GPU-assisted validation. Since it seemed useful for printf as well, we kept it.

Ripping this information out of debug printf would possibly cause compatibility issues, so I no longer support the stage-agnostic solution.

Rather, I recommend the solution of checking for and cloning shared functions if necessary.

@natevm
Copy link
Author

natevm commented Aug 1, 2022

That partially makes sense, but from the outside I'm still a bit confused... Just to clarify, why is it an issue for printf that multiple entry points share common functions? I get why printf might want to know which entry point called that printf. What I don't understand is why shared functions prevent this metadata from being attached to the printf.

Perhaps you could just enable the printf's metadata to understand that some functions are going to be shared between entry points?

@greg-lunarg
Copy link
Contributor

Debug printf is built on instrumentation code that is stage-specific.

We could possibly re-design the instrumentation to not be stage-specific, but it would add to size, speed and complexity for GPU-assisted validation and debug printf for the majority of codes out there. And it would be costly and disruptive.

At the time the instrumentation was designed, and even today, stage-specificity is not a problem for the vast majority of codes since glslang is not generating multi-stage modules.

In my estimation, the overall least cost, most maintainable, and best performing solution for the community is the one I recommend above.

@natevm
Copy link
Author

natevm commented Aug 1, 2022

Hm, ok, just to clarify,
If I have two stages and a common function between them:

void commonFunction()
{
  printf("Hello!");
}

[shader("raygeneration")]
void simpleRayGen() {
  printf("Hello from the raygen program!\n");
  commonFunction();
}

struct Payload {int tmp;};
[shader("miss")]
void simpleMissProg(inout Payload MyPayload) {
  printf("Hello from the miss program!\n");
  commonFunction();
}

That common function needs to be inlined so that the printf has context as to which shader stage is calling it?

I am ok with the solution you proposed. Better that then what I'm forced to do right now, where I have to modify my SPIR-V on the fly to get the Khronos validation layers to work.

I'm mostly thinking about scalability in the future, eg, in a world where there are hundreds to thousands of compute entry points like we see in CUDA...

@greg-lunarg
Copy link
Contributor

Yes, commonFunction() would need to be inlined. This can be easily (?) done with spirv-opt --eliminate-dead-code-aggressive. The ? suggests it is possible that this inlining might not work due to some restriction, but fairly unlikely as it has been pretty well tested and generally works.

We would still need to add logic to the debugPrintf pass to allow for multi-stage when no common function exists, and check that that condition is true.

@greg-lunarg
Copy link
Contributor

This can be easily (?) done with spirv-opt --eliminate-dead-code-aggressive.

Sorry. Must have had a case of the Monday brain cramps. Meant to say:

This can be easily (?) done with spirv-opt --inline-entry-points-exhaustive ...

@natevm
Copy link
Author

natevm commented Aug 2, 2022

Sounds like a plan! :)

Could this inlining be done with the dxc compiler? At the moment, I'm using dxc to compile multiple entry points into one .spv file, and then I use SPIR-V tools to tweak the final spv. Ideally I could drop SPIR-V tools from my repo and just use the output from dxc directly. Wondering if it's possible somehow to use "--inline-entry-points-exhaustive" with dxc then.

Edit: Sorry, still learning some of these tools. Appears that spirv-opt is a binary that I'd run after using the dxc compiler to take hlsl -> spv. So then that would mean dxc wouldn't need to change, just need to add in spirv-opt into my build tools.

I can create a couple easy test cases of when exhaustive inlining has been properly done vs not, and then use that to test out the modified logic to the debugPrintf pass.

Update: reviewing -O in the spirv-optimizer, it appears that inline-entry-points-exhaustive is automatically included. I suspect that the majority of dxc hlsl->spv users use -O by default, so that gives more credit to this plan being relatively smooth to implement.

@natevm
Copy link
Author

natevm commented Aug 3, 2022

Ok, so given this code:

void commonFunction() {
  printf("Common function!\n");
}

[shader("raygeneration")]
void RayGen1() {
  commonFunction();
}

[shader("raygeneration")]
void RayGen2() {
  commonFunction();
}

Compiling with DXC and disabling optimization yields:

; SPIR-V
; Version: 1.4
; Generator: Google spiregg; 0
; Bound: 20
; Schema: 0
               OpCapability RayTracingKHR
               OpExtension "SPV_KHR_non_semantic_info"
               OpExtension "SPV_KHR_ray_tracing"
          %1 = OpExtInstImport "NonSemantic.DebugPrintf"
               OpMemoryModel Logical GLSL450
               OpEntryPoint RayGenerationNV %RayGen1 "RayGen1"
               OpEntryPoint RayGenerationNV %RayGen2 "RayGen2"
          %4 = OpString "Common function!
"
               OpSource HLSL 630
               OpName %RayGen1 "RayGen1"
               OpName %src_RayGen1 "src.RayGen1"
               OpName %bb_entry "bb.entry"
               OpName %RayGen2 "RayGen2"
               OpName %src_RayGen2 "src.RayGen2"
               OpName %bb_entry_0 "bb.entry"
               OpName %commonFunction "commonFunction"
               OpName %bb_entry_1 "bb.entry"
       %void = OpTypeVoid
          %6 = OpTypeFunction %void
    %RayGen1 = OpFunction %void None %6
          %7 = OpLabel
          %8 = OpFunctionCall %void %src_RayGen1
               OpReturn
               OpFunctionEnd
%src_RayGen1 = OpFunction %void None %6
   %bb_entry = OpLabel
         %11 = OpFunctionCall %void %commonFunction
               OpReturn
               OpFunctionEnd
    %RayGen2 = OpFunction %void None %6
         %13 = OpLabel
         %14 = OpFunctionCall %void %src_RayGen2
               OpReturn
               OpFunctionEnd
%src_RayGen2 = OpFunction %void None %6
 %bb_entry_0 = OpLabel
         %17 = OpFunctionCall %void %commonFunction
               OpReturn
               OpFunctionEnd
%commonFunction = OpFunction %void None %6
 %bb_entry_1 = OpLabel
         %19 = OpExtInst %void %1 1 %4
               OpReturn
               OpFunctionEnd

That %commonFunction is an issue for printf. If I allow dxc to use -O (which includes inline-entry-points-exhaustive) then I get this:

; SPIR-V
; Version: 1.4
; Generator: Google spiregg; 0
; Bound: 28
; Schema: 0
               OpCapability RayTracingKHR
               OpExtension "SPV_KHR_non_semantic_info"
               OpExtension "SPV_KHR_ray_tracing"
          %1 = OpExtInstImport "NonSemantic.DebugPrintf"
               OpMemoryModel Logical GLSL450
               OpEntryPoint RayGenerationNV %RayGen1 "RayGen1"
               OpEntryPoint RayGenerationNV %RayGen2 "RayGen2"
          %4 = OpString "Common function!
"
               OpSource HLSL 630
               OpName %RayGen1 "RayGen1"
               OpName %RayGen2 "RayGen2"
       %void = OpTypeVoid
          %6 = OpTypeFunction %void
    %RayGen1 = OpFunction %void None %6
          %7 = OpLabel
         %23 = OpExtInst %void %1 1 %4
               OpReturn
               OpFunctionEnd
    %RayGen2 = OpFunction %void None %6
         %13 = OpLabel
         %27 = OpExtInst %void %1 1 %4
               OpReturn
               OpFunctionEnd

Here's the assembled spv for both:

This one's the one with common functions inlined
deviceCode_as.spv.txt

And here's one without.
deviceCode_unoptimized_as.spv.txt

So, the next step would be to write that logic to reject the unoptimized one and allow for the optimized one.

@csyonghe
Copy link

According to the extension spec, there are no clauses to rule out the use of printf from a SPIRV module that contains a mix of different entry-point types. I understand there might be some optimizations that allows things to be faster when there is not such use, but there are widespread undesired impact for low-level tools like the validation layer to make any further assumptions on users code beyond the spec. In this case, it is more appropriate to keep the current implementation to be a "fast" path but allow falling back to something more general and lenient when the assumption does not hold.

@spencer-lunarg
Copy link
Contributor

The "stage-specific" info Greg mentioned is the LaunchIdKHR built-in and then which stage.

A simple way out of this might be exploiting the fact if all the Entry Points in a multi-entrypoint are Ray Tracing stage, they all use LaunchIdKHR

One possible mechanism for solution could be to has the "which stage" value be a spec constant, and then at pipeline creation time, track it was a multi-entrypoint VkShaderModule and set the spec constant to match the stage in VkPipelineShaderStageCreateInfo::stage. Also to make it simpler, we could always just set a known ConstantID value regardless as it perfectly legal to set a VkSpecializationMapEntry that doesn't appear in the SPIR-V itself

According to the extension spec,

Note this issue expands not only to Debug Printf, but to all GPU-AV logic, and since multi-entrypoint is more common with Ray Tracing stages, there will need to be some form a solution here

@natevm
Copy link
Author

natevm commented Nov 30, 2023

The printf layer should not make any assumptions about what entry point stages exist in a program. If users need additional information about what entry point is calling a printf for GPU-AV, this should be an explicit extension of some sort.

I agree that knowing what entry point is calling a printf could be useful for some specific GPU-AV problems, but it’s a heavy handed constraint that deviates from spec and is preventing printf from working for a much larger set of users who do not need this information. Users could also just as easily print out what entry point is calling the function as part of the printf message itself.

As the spec stands today, it is perfectly legal to use printf anywhere in a SPV program containing a mix of compute shaders, a vertex and pixel shaders, raygen, closest hit and miss shaders, and any other type. (Unless I’m mistaken, in which case please point me to where this is stated…)

Compiler developers at NVIDIA are depending on this being true as we push forward with mixed entry point codes with arbitrary stage types. Generating SPV should never be required to adhere to more constraints than what’s stated in the spec.

@jeremyg-lunarg
Copy link
Contributor

The printf layer should not make any assumptions about what entry point stages exist in a program. If users need additional information about what entry point is calling a printf for GPU-AV, this should be an explicit extension of some sort.

I agree that knowing what entry point is calling a printf could be useful for some specific GPU-AV problems, but it’s a heavy handed constraint that deviates from spec and is preventing printf from working for a much larger set of users who do not need this information. Users could also just as easily print out what entry point is calling the function as part of the printf message itself.

Up until a couple months ago, the debug printf and GPU-AV instrumentation shared the same code for logging output data. GPU-AV does need stage information and since they shared the same code, debug printf got it 'for free' even though it wasn't needed. This commit broke that coupling.

I've put up a SPIRV-Tools PR that removes the stage info from the debug printf instrumentation. I'm working on the corresponding VVL change to go with it.

We'll still need to solve the multi-stage, multi-entrypoint shader problem for GPU-AV separately.

@natevm
Copy link
Author

natevm commented Dec 1, 2023

Awesome. Sounds like things are moving in the right direction. Thank you for your help.

jeremyg-lunarg added a commit to jeremyg-lunarg/Vulkan-ValidationLayers that referenced this issue Dec 1, 2023
Remove stage specific debug info that is only needed by GPU-AV.
This allows debug printfs to be used in multi stage shader modules.

Requires KhronosGroup/SPIRV-Tools#5495

Fixes KhronosGroup#4350
jeremyg-lunarg added a commit to jeremyg-lunarg/Vulkan-ValidationLayers that referenced this issue Dec 4, 2023
Remove stage specific debug info that is only needed by GPU-AV.
This allows debug printfs to be used in multi stage shader modules.

Requires KhronosGroup/SPIRV-Tools#5495

Fixes KhronosGroup#4350
jeremyg-lunarg added a commit to jeremyg-lunarg/Vulkan-ValidationLayers that referenced this issue Dec 4, 2023
Remove stage specific debug info that is only needed by GPU-AV.
This allows debug printfs to be used in multi stage shader modules.

Requires KhronosGroup/SPIRV-Tools#5495

Fixes KhronosGroup#4350
jeremyg-lunarg added a commit to jeremyg-lunarg/Vulkan-ValidationLayers that referenced this issue Dec 4, 2023
Remove stage specific debug info that is only needed by GPU-AV.
This allows debug printfs to be used in multi stage shader modules.

Requires KhronosGroup/SPIRV-Tools#5495

Fixes KhronosGroup#4350

Also includes the VUL linking change from KhronosGroup#7050
jeremyg-lunarg added a commit that referenced this issue Dec 5, 2023
Remove stage specific debug info that is only needed by GPU-AV.
This allows debug printfs to be used in multi stage shader modules.

Requires KhronosGroup/SPIRV-Tools#5495

Fixes #4350

Also includes the VUL linking change from #7050
jeremyg-lunarg added a commit to jeremyg-lunarg/Vulkan-ValidationLayers that referenced this issue Dec 18, 2023
Fix some formatting and add an explanation of the changes made for
Issue KhronosGroup#4350, just in case someone runs into them.
spencer-lunarg pushed a commit that referenced this issue Dec 19, 2023
Fix some formatting and add an explanation of the changes made for
Issue #4350, just in case someone runs into them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RT Ray Tracing ShaderVal Shader Validation (SPIR-V related)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants