Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore early finalization #2443

Open
maleadt opened this issue Jul 12, 2024 · 6 comments
Open

Explore early finalization #2443

maleadt opened this issue Jul 12, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request performance How fast can we go?

Comments

@maleadt
Copy link
Member

maleadt commented Jul 12, 2024

Julia support early finalization insertion, JuliaLang/julia#45272, however that does not trigger here because CuArrays' finalizers taints the TLS effect. Keno suggested just untainting that using @assume_effects, which we should explore.

If that doesn't work / In addition, @aviatesk suggested exploring integrating early finalization insertion with escape analysis, which may make this optimization even more potent.

Let's first start with the untainting and coming up with a couple of MWEs to look into.

cc @jpsamaroo

@maleadt maleadt added enhancement New feature or request performance How fast can we go? labels Jul 12, 2024
@maleadt maleadt self-assigned this Jul 12, 2024
@maleadt
Copy link
Member Author

maleadt commented Jul 13, 2024

@vchuravy noted that this may also extend the lifetime of objects, up until the inserted finalizer whereas the GC could have collected it already if there were no outstanding references. Similar issues: JuliaLang/julia#51818, JuliaLang/julia#52533 (#2197). I'm not sure this is a blocker, but it's something to keep in mind, e.g., to make sure the finalizer is inserted as aggressively as possible:

a = CuArray(...)

if likely()
    a = nothing
    # finalizer should be inserted here
    while something_long()
        # ...
    end
else
    use(a)
end

# finalizer should not be inserted here,
# or `a` would be kept alive across the while loop

The above may not match how the finalizer insertion pass currently works; I haven't properly read into it yet.

@aviatesk
Copy link

I believe the biggest current blocker is that the finalizer inlining pass currently assumes that all operations on the target object are inlined. In other words, even simple code like the following cannot currently perform finalizer inlining:

@noinline function use(a)
   ... # uses a, but doesn't escape it to anywhere
end

let 
    Base.Experimental.@force_compile
    a = CuArray(...)
    use(a)
end

Here, using EA to analyze that use(x) does not escape x and enabling finalizer inlining would be the first step.
Specifically, could you come up with a concrete target code like the simple case above? I would like to use it to test EA ability and start optimizing for the simplest cases. Aggressive finalizer inlining in cases involving branches is also important, but let's start with the simple cases first.

@vchuravy
Copy link
Member

A while back I came up with the following example for where currently the GC fails us.
I didn't write it with early finalization in mind and it was more a test-bed for automatic reference counting,
but the idea holds.

For me early finalization is too fragile and too dependent on inlining.

mutable struct ForeignBuffer{T}
     const ptr::Ptr{T}
end

import Base: Libc

mutable struct HeapTracker
    const lock::Base.Threads.SpinLock
    const dict::Dict{Ptr{Cvoid}, Int}
    @atomic size::Int
    HeapTracker() = new(Base.Threads.SpinLock(), Dict{Ptr{Cvoid}, UInt}(), 0)
end

Base.lock(t::HeapTracker) = lock(t.lock)
Base.unlock(t::HeapTracker) = unlock(t.lock)
const TRACKER = HeapTracker()

function tracked_malloc(size)
    local ptr
    @lock TRACKER begin
        @atomic TRACKER.size += size
        ptr = Libc.malloc(size)
        TRACKER.dict[ptr] = size
    end
    ptr
end

function tracked_free(ptr::Ptr)
    ptr = Base.unsafe_convert(Ptr{Cvoid}, ptr)
    @lock TRACKER begin
        if !haskey(TRACKER.dict, ptr)
            error("Double free")
        end
        size = pop!(TRACKER.dict, ptr)
        @atomic TRACKER.size -= size
        Libc.free(ptr)
    end
end

function stats()
    @info "Foreign heap size (bytes)" heap=TRACKER.size
end

function foreign_alloc(::Type{T}, length) where T
    ptr = tracked_malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    finalizer(obj->tracked_free(obj.ptr), obj)
end

function main(N, iterations)
    for _ in 1:iterations
        workspace = foreign_alloc(Float64, N)
        GC.@preserve workspace begin
            ptr = workspace.obj
            # ... use ptr
        end
        stats()
    end
end

@aviatesk
Copy link

That is too complex, so I would prefer a simpler target if possible. CUDA might also be complex once lowered though.

@vchuravy
Copy link
Member

This is pretty much the simplest thing. The only complexity here is the allocation tracking so that you can immediatly tell if you are successful. You can remove the tracker...

mutable struct ForeignBuffer{T}
     const ptr::Ptr{T}
end

import Base: Libc

# unlikely to be inlined
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    finalizer(obj->Libc.free(obj.ptr), obj)
   obj
end

function main(N, iterations)
    for _ in 1:iterations
        workspace = foreign_alloc(Float64, N)
        GC.@preserve workspace begin
            ptr = workspace.obj
            # ... use ptr
        end
    end
end

@aviatesk
Copy link

aviatesk commented Oct 1, 2024

Started some work at JuliaLang/julia#55954.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance How fast can we go?
Projects
None yet
Development

No branches or pull requests

3 participants