You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I can see the appeal of such a backend, but at the same time, there are use-cases where users may really want a CPU runtime for their shaders and don't care that much about performance.
For instance, in Vello, we end up maintaining CPU pseudo-shaders in parallel of our actual WGSL shaders, mostly for testing and as a fallback. Personally, I'd like to push the fallback case even further so we can run Vello on machines without GPUs; in those cases, being able to run anything at all is a win, even with degraded performance. If we could achieve that and get rid of our duplicate CPU shaders, that would be a massive win for us.
Have you considered making a best-effort CPU runtime? One where annotated rust functions are simply lowered to regular rust functions, and you leave auto-vectorization to the rustc backend? How much effort do you think it would take to implement that runtime?
The text was updated successfully, but these errors were encountered:
Making a low-effort CPU runtime would probably be as hard as making a proper CPU runtime. To speed things up, we might generalize our CUDA compiler to a C++ compiler and compile it using gcc or llvm. The compiler wouldn't be embedded, but it would be faster to develop.
In your README, you mention wanting to build a JIT cranelift backend for the CPU.
I can see the appeal of such a backend, but at the same time, there are use-cases where users may really want a CPU runtime for their shaders and don't care that much about performance.
For instance, in Vello, we end up maintaining CPU pseudo-shaders in parallel of our actual WGSL shaders, mostly for testing and as a fallback. Personally, I'd like to push the fallback case even further so we can run Vello on machines without GPUs; in those cases, being able to run anything at all is a win, even with degraded performance. If we could achieve that and get rid of our duplicate CPU shaders, that would be a massive win for us.
Have you considered making a best-effort CPU runtime? One where annotated rust functions are simply lowered to regular rust functions, and you leave auto-vectorization to the rustc backend? How much effort do you think it would take to implement that runtime?
The text was updated successfully, but these errors were encountered: