Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend getproperty and setproperty! with swizzling for vectors #1221

Closed
wants to merge 4 commits into from

Conversation

serenity4
Copy link

@serenity4 serenity4 commented Nov 16, 2023

Closes #1159, also adding appropriate setproperty! methods to be in line with their getproperty counterparts.

This improves the use of small static vectors in the context of computer graphics, where these swizzles are very common and super cheap on the GPU (which has dedicated hardware for that). Defining a swizzle method separately from getproperty will allow GPU compilers such as SPIRV.jl to overlay that method with a specialized one that maps to a GPU intrinsic, e.g. OpVectorShuffle in SPIR-V.

Not only does expressing swizzles enable better performance, it also makes it easier for graphics programmers to code where many algorithms are written with swizzles of the sort, bizarre as that may seem. See this Reddit thread for a more in-depth explanation of why swizzling is used at all.

The color accessors (.r, .rgb, .rgba etc) are also useful where vectors refer to colors, also very common in computer graphics, where vectors are used for pretty much everything given how first-class they are in GPU shaders. Mixing spatial and color accessors doesn't really make sense, so things like .xgb are disallowed. I would argue that these color accessors are a nice feature to have, although if there is strong resistance on that front this is less important as being able to swizzle at all (e.g. only using spatial accessors).

Any comments welcome!

@serenity4
Copy link
Author

serenity4 commented Nov 16, 2023

Note that performance should remain unaffected, so long as the getproperty/setproperty! name is folded. See this example where it is folded and where it is not:

julia> v4 = @SVector [1, 2, 3, 4]
4-element SVector{4, Int64} with indices SOneTo(4):
 1
 2
 3
 4

julia> @btime (v -> v.zx .+ v.zz .+ v.xx .+ v.y .+ v.yz .+ v.xx)($v4)
  1.711 ns (0 allocations: 0 bytes)
2-element SVector{2, Int64} with indices SOneTo(2):
 12
 11

julia> @btime getproperty($v4, x) setup = x = :wwzy
  89.658 ns (1 allocation: 48 bytes)
4-element SVector{4, Int64} with indices SOneTo(4):
 4
 4
 3
 2

The only thing I'm not so sure about is whether to keep the @inline annotation for these getproperty/setproperty! methods, as without a constant property name it ends up with a lot of code. I would tend to believe that if everything is folded correctly, then it will be small enough to be inlined by the compiler, and if there is no constant folding, then it's best to not inline it, so I'd be inclined to remove that @inline annotation.

@serenity4
Copy link
Author

Any feedback on this, in favor or against this feature and/or regarding implementation details?

@hyrodium
Copy link
Collaborator

these swizzles are very common and super cheap on the GPU (which has dedicated hardware for that).

I understand swizzling is common in some areas, but someone may want to avoid swizzling because they prefer to [w,x,y,z] instead of [x,y,z,w]. See JuliaGeometry/Rotations.jl#210 (comment) for example.

I'm not sure how big the impact of this change is, and I would like to wait for some other reviews.

@hyrodium
Copy link
Collaborator

I would argue that these color accessors are a nice feature to have, although if there is strong resistance on that front this is less important as being able to swizzle at all (e.g. only using spatial accessors).

Are there any situations which you really need StaticVector to represent colors? Isn't ColorTypes.jl enough?

@serenity4
Copy link
Author

I understand swizzling is common in some areas, but someone may want to avoid swizzling because they prefer to [w,x,y,z] instead of [x,y,z,w]. See JuliaGeometry/Rotations.jl#210 (comment) for example.

Actually, the issue you linked seems to be a case that may have benefited from swizzling. The order [x, y, z, w] for vectors is the most common, in reference to homogeneous vectors used in many fields (computer graphics, but also robotics, computer vision, and pretty much anything that deals with geometry). If one abuses 4-component vectors to represent quaternions and desires to enforce a specific component order (e.g. putting the real part w before the pure part [x, y, z]), then that would be done as simply as q = vec.wxyz instead of q = @SVector [vec[4], vec[1], vec[2], vec[3]]. That also means that there is less incentive to release a breaking version to change a canonical ordering if the order may be easily changed by the user, and even less so if such a re-arrangement is virtually free on specific hardware.

Are there any situations which you really need StaticVector to represent colors? Isn't ColorTypes.jl enough?

TL;DR: even if I could use another type to represent colors, it would harm performance not to use the efficient built-in vector type to do so. So what we'd end up with is reusing swizzling in the spatial domain like color_without_alpha = vec.xyz instead of color_without_alpha = vec.rgb because the alternative is not worth it.

When writing shaders, one is expected to interact with most data using built-in types, and notably vectors, instead of user-defined structs. The reason for this is that built-in vectors are fast and have dedicated hardware operations. In an effort to write GPU shaders using Julia code, I replicated such built-in vector types here: https://github.com/serenity4/SPIRV.jl/blob/44de86dde135b1f172bbdcf9741af7e6b0b225a0/src/frontend/types/vector.jl along with operations that are found in most GLSL/HLSL shaders, including swizzling. The Vec defined over there is essentially a clone of MVector. (why SVector wasn't replicated instead would be a big rabbit hole to go into, essentially shader IR - in this case SPIR-V - does not model mutability in the type system).

That works well and all, but it does not compose with the Julia ecosystem. Say I want to use a geometry processing library such as Meshes.jl to express a line-line intersection on the GPU. Meshes uses StaticArrays under the hood, such that shader code providing Vec inputs won't work unless built-in vectors are converted to static arrays. The issue is, by converting from a built-in GPU type to a SArray (which is defined as a struct holding a tuple of components), we lose all the benefits of built-in hardware support.

As StaticArrays are used in so many libraries nowadays, composing with code that uses them would greatly improve code reuse for numerical computations to be performed within GPU shaders.

Therefore, I'd like to not introduce a new Vec type to represent built-in GPU vectors but rather use StaticArrays types as the data structure to represent built-in GPU vectors. By doing that, I would need to support hardware operations on static arrays, and hopefully in a way that translates well to what most of the graphics programming community would expect to avoid confusion (helping translation between existing shader code and what would be equivalent Julia code). Swizzling is one of these, and ideally would allow color component extraction in its syntax to feel more familiar with graphics programmers.

I'm not sure how big the impact of this change is, and I would like to wait for some other reviews.

I'd also wait for confirmation that this won't have an impact on the ecosystem, given how many dependents there are.

We can reduce the likelihood of performance impacts by disallowing repeated components in swizzling, such as .xxyz. When enabling repeated components, we get a pretty massive code size of 36884 lines:

julia> using StaticArrays

julia> v4 = @MVector [10.0, 20.0, 30.0, 40.0];

julia> @code_typed getproperty(v4, :xyz)
[truncated]

15651%36877 = $(Expr(:gc_preserve_begin, Core.Argument(2)))
│       %36878 = $(Expr(:foreigncall, :(:jl_value_ptr), Ptr{Nothing}, svec(Any), 0, :(:ccall), Core.Argument(2)))::Ptr{Nothing}%36879 = Base.bitcast(Ptr{Int64}, %36878)::Ptr{Int64}%36880 = Base.pointerref(%36879, 1, 1)::Int64$(Expr(:gc_preserve_end, :(%36877)))
└──────          goto #15652
15652 ─          goto #15653
15653return %36880
) => Union{Int64, NTuple{4, Int64}, MArray{S, Int64, 1} where S<:Tuple}

whereas without repeated components this drops down to 5059 lines:

julia> @code_typed getproperty(v4, :xyz)
[truncated]

2183%5052 = $(Expr(:gc_preserve_begin, Core.Argument(2)))
│      %5053 = $(Expr(:foreigncall, :(:jl_value_ptr), Ptr{Nothing}, svec(Any), 0, :(:ccall), Core.Argument(2)))::Ptr{Nothing}%5054 = Base.bitcast(Ptr{Int64}, %5053)::Ptr{Int64}%5055 = Base.pointerref(%5054, 1, 1)::Int64$(Expr(:gc_preserve_end, :(%5052)))
└─────         goto #2184
2184 ─         goto #2185
2185return %5055
) => Union{Int64, NTuple{4, Int64}, MArray{S, Int64, 1} where S<:Tuple}

but that's still a lot and we're very far from the smaller change of #980. If it may cause any issues in existing code, I'm ready to give up on this feature and try to find a compromise for my use case. However, if there is no reason to expect a noticeable impact due to the ability to const-propagate and optimize away all of the branches, then that would be great to have. For example, direct field access is optimized away like so:

julia> @code_typed (x -> x.x)(v4)
julia> @code_typed debuginfo=:source (x -> x.x)(v4)
CodeInfo(
    @ REPL[16]:1 within `#21`
   ┌ @ /home/serenity4/.julia/dev/StaticArrays/src/MVector.jl:73 within `getproperty`
   │┌ @ /home/serenity4/.julia/dev/StaticArrays/src/MVector.jl:28 within `swizzle`
   ││┌ @ /home/serenity4/.julia/dev/StaticArrays/src/MArray.jl:21 within `getindex`
1 ─│││ %1  = $(Expr(:boundscheck, true))::Bool
└──│││       goto #6 if not %1
   │││┌ @ abstractarray.jl:697 within `checkbounds`
2 ─││││ %3  = Core.tuple(1)::Tuple{Int64}
│  ││││ @ abstractarray.jl:699 within `checkbounds` @ abstractarray.jl:684
│  ││││┌ @ abstractarray.jl:758 within `checkindex`
│  │││││┌ @ int.jl:514 within `<=`
│  ││││││ %4  = Base.sle_int(1, 1)::Bool
│  ││││││ %5  = Base.sle_int(1, 4)::Bool
│  │││││└
│  │││││┌ @ bool.jl:38 within `&`
│  ││││││ %6  = Base.and_int(%4, %5)::Bool
│  ││││└└
│  ││││ @ abstractarray.jl:699 within `checkbounds`
└──││││       goto #4 if not %6
   ││││ @ abstractarray.jl:700 within `checkbounds`
3 ─││││       goto #5
   ││││ @ abstractarray.jl:699 within `checkbounds`
4 ─││││       invoke Base.throw_boundserror(x::MVector{4, Int64}, %3::Tuple{Int64})::Union{}
└──││││       unreachable
5 ─││││       nothing::Nothing
   │││└
   │││ @ /home/serenity4/.julia/dev/StaticArrays/src/MArray.jl:25 within `getindex`
6 ┄│││ %12 = $(Expr(:gc_preserve_begin, Core.Argument(2)))
│  │││┌ @ pointer.jl:270 within `pointer_from_objref`
│  ││││ %13 = $(Expr(:foreigncall, :(:jl_value_ptr), Ptr{Nothing}, svec(Any), 0, :(:ccall), Core.Argument(2)))::Ptr{Nothing}
│  │││└
│  │││┌ @ essentials.jl:555 within `unsafe_convert`
│  ││││┌ @ pointer.jl:30 within `convert`
│  │││││ %14 = Base.bitcast(Ptr{Int64}, %13)::Ptr{Int64}
│  │││└└
│  │││┌ @ pointer.jl:119 within `unsafe_load`
│  ││││ %15 = Base.pointerref(%14, 1, 1)::Int64
│  │││└
│  │││       $(Expr(:gc_preserve_end, :(%12)))
└──│││       goto #7
   ││└
7 ─││       goto #8
   │└
8 ─│       goto #99return %15
) => Int64

which, although it seems fairly large, is actually the code for a single getindex access, and was confirmed to be fast with benchmarks.

What I would therefore like to know for certain is whether this may be expected in general or if the large code size would, in certain circumstances, prevent such optimizations from being applied (such circumstances being for example that a parent function being compiled also has a large code size in a way that affects inlining decisions).

@serenity4
Copy link
Author

(tagging @c42f as the author of #980 in case she may have an opinion on this)

@c42f
Copy link
Member

c42f commented Nov 25, 2023

Hey 👋 This is cool and I did think about supporting swizzling a long time ago (the ancient history of this library was in fact influenced by computer graphics short-vector libraries!).

My main reservation is that swizzling is very domain-specific to computer graphics; perhaps too much so to be built into a general purpose library like StaticArrays. Especially for the color accessors, which were deliberately left out of #980

However, implementing this as a @swizzle macro could be a nice alternative to adding so much "builtin" machinery via get/set property methods.

The usage might look like this:

u = SVector(1,2,3)
t = SVector(1,0,1,0)

@swizzle begin
    v = u.zy + u.xy
    e = t.arb
end

The @swizzle macro would look through the code for any Expr(:., name, QuoteNode(fieldname)) where fieldname matches any of the possible supported swizzling patterns. Then it would lower the code above to something like

u = SVector(1,2,3)
t = SVector(1,0,1,0)

begin
    v = swizzle(u, 3, 2) + swizzle(u, 1, 2)
    e = swizzle(t, 4, 1, 3)
end

This largely sidesteps the problem of a "standard name to index mapping" which @hyrodium mentions, because the ordering is opt-in via the use of @swizzle. (And if someone else were to really want a different ordering... they can implement @my_swizzle for themselves. That is, swizzling would be lexically scoped rather than a property of the type.)

@c42f
Copy link
Member

c42f commented Nov 25, 2023

Another interesting property of @swizzle lowering syntactic patterns to swizzle(): you can use @swizzle with any type which defines an implementation of swizzle() regardless of whether it's a StaticArray or even AbstractArray.

With that in mind it could even be in a separate library. But I've got some sympathy for having it here :)

@serenity4
Copy link
Author

That's a fantastic idea, thank you @c42f! I wish I had thought of it before making this PR :)

I think such functionality would fit better in an external package, so I just made https://github.com/serenity4/Swizzles.jl. I plan to have it registered soon when documentation is complete. Feel free to post any comments here or in an issue on that repo!

@serenity4 serenity4 closed this Nov 25, 2023
@serenity4
Copy link
Author

serenity4 commented Nov 25, 2023

What is implemented there is slightly different than your proposal above, in that @swizzle only applies to a single expression, not recursively. Otherwise @swizzle v.xyz = value.an_actual_field might be nontrivial to support without surprising behavior, as it would have to detect that .an_actual_field is not a swizzle, and especially if something like .xyz were to be an actual field of an object.

EDIT: This is no longer the case as of serenity4/Swizzles.jl@1324794 (see commit message for more details).

@c42f
Copy link
Member

c42f commented Nov 28, 2023

@serenity4 What a lovely little package! Very nice ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend getproperty overloads for vectors
3 participants