Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The "quick fix with caveats" example is not correct #136

Closed
KristofferC opened this issue Feb 20, 2025 · 13 comments · Fixed by #137
Closed

The "quick fix with caveats" example is not correct #136

KristofferC opened this issue Feb 20, 2025 · 13 comments · Fixed by #137

Comments

@KristofferC
Copy link

KristofferC commented Feb 20, 2025

The example in https://juliafolds2.github.io/OhMyThreads.jl/stable/literate/tls/tls/#The-quick-fix-(with-caveats) assumes that threadid() will be in the range 1:nthreads() which is not true in the presence of e.g. interactive threads. Running the example on master where there is an interactive thread by default gives:

  nested task error: BoundsError: attempt to access 1-element Vector{Matrix{Float64}} at index [2]
    Stacktrace:
      [1] throw_boundserror(A::Vector{Matrix{Float64}}, I::Tuple{Int64})
        @ Base ./essentials.jl:15
      [2] getindex
        @ ./essentials.jl:917 [inlined]
      [3] (::var"#matmulsums_perthread_static##2#matmulsums_perthread_static##3"{})(A::Matrix{…}, B::Matrix{…})
        @ Main ./REPL[3]:5
      [4] #Generator##0
        @ ./generator.jl:37 [inlined]

Some further examples:

julia> Threads.nthreads()
1

julia> Threads.maxthreadid()
2

julia> Threads.@threads for i = 1:10
               @show Threads.threadid()    
           end
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2

This example could be changed to use maxthreadid which might waste some space or perhaps be removed.

@MasonProtter
Copy link
Member

MasonProtter commented Feb 20, 2025

Thanks. Looks like the 'quick fix' is no longer so quick. @carstenbauer what do you think, should we just remove this example now that there will be an interactive thread by default?

We never really wanted to encourage people to do this sort of thing in the first place anyways.

@MasonProtter
Copy link
Member

MasonProtter commented Feb 20, 2025

Note that we do discuss this issues with threadids being bad two paragraphs above the section linked:

Unfortunately, this approach is generally wrong. The first issue is that threadid() doesn't necessarily start at 1 (and thus might return a value > nthreads()), in which case Cs[threadid()] would be an out-of-bounds access attempt. This might be surprising but is a simple consequence of the ordering of different kinds of Julia threads: If Julia is started with a non-zero number of interactive threads, e.g. --threads 5,2, the interactive threads come first (look at Threads.threadpool.(1:Threads.maxthreadid())).

and then in the linked section we say

However, this approach doesn't solve the offset issue and,

@KristofferC
Copy link
Author

That's true, I missed the "However, this approach doesn't solve the offset issue" part. I guess the question is if there is any value at all in describing it as a "fix" then.

@MasonProtter
Copy link
Member

MasonProtter commented Feb 20, 2025

Yeah. I think with the changes coming in v1.12, the value of that section is greatly reduced and we should probably just remove it.

@carstenbauer
Copy link
Member

Yeah, I think we should remove this bit. Perhaps we should also emphasize the potential of an out of bounds error even more since that's now much more likely.

@KristofferC
Copy link
Author

KristofferC commented Feb 21, 2025

The reason I was looking at this is because PkgEval shows a large number of packages using the naive and wrong approach and now error (with the default interactive thread). To give some examples:

I intend to open issues on these repos with the following text (happy with some help in formulating this):


Likely erroneous use of Threads.nthreads and Threads.threadid

Hello,

From the following PkgEval log: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/5da257d_vs_d63aded/$PACKAGE.log it seems likely that this package is using Threads.nthreads and Threads.threadid incorrectly as described in https://juliafolds2.github.io/OhMyThreads.jl/stable/literate/tls/tls/#The-naive-(and-incorrect)-approach.

Shortly, in Julia 1.12, julia runs with an interactive thread by default which means that the number of worker threads nthreads() defaults to 1 but the thread id that things will run on in a threaded context will typically be 2 (the interactive thread having the id 1):

julia> Threads.nthreads()
1

julia> Threads.@threads for i = 1:3
           @show Threads.threadid()    
       end
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2

Trying to index a vector of length Threads.nthreads() with Threads.threadid() will thus error.

Some remedies to this are:


@MasonProtter
Copy link
Member

MasonProtter commented Feb 21, 2025

Yeah that looks good, except I'd maybe emphasize more that maxthreadid is the least safe option. Do you think we should change the "Quick Fix" section to use maxthreadid() and then point users to that in your post?

I'm a little unsure about doing that because maxthreadid is not public, and I have no idea what other whacky incompatible stuff might change in the future with threadids. On the other hand, it might be good to still at least spell it out in the docs so people have something they can migrate to that does the same thing as before.

But I still don't like it because they're very likely to not realize that the code only produces correct results if they have a static schedule, and then accidentally break it in the future when they refactor.

So the alternative would be to just not describe this (anti-)pattern in the docs here. Any thoughts?

@KristofferC
Copy link
Author

KristofferC commented Feb 21, 2025

I'm a little unsure about doing that because maxthreadid is not public, and I have no idea what other whacky incompatible stuff might change in the future with threadids.

JuliaLang/julia#57490 though (it is in the docs for example).

I agree that it is probably best to emphasize the caveats with using it.

@MasonProtter
Copy link
Member

Ah, I see. Okay, then we'll put it back in the docs but emphasize that it is the least desirable option.

@MasonProtter
Copy link
Member

In your text you plan to post to repos that were found via PkgEval, it might be good to at also mention the problems from task migration, and say that they have to use @threads :static otherwise the code is still broken.

@MasonProtter
Copy link
Member

MasonProtter commented Feb 21, 2025

minimal demo:

$ julia -t8 --startup=no -q
julia> Threads.@threads for i  1:1000
           # Nested use of `@threads` is not required to encounter this bug
           # but it makes it easier to replicate since it over-subscribes
           # the threads, causing more task-migration
           Threads.@threads for i  1:1000
               tid = Threads.threadid()
               yield()
               
               if tid !== Threads.threadid()
                   error("you fucked up!")
               end
           end
       end
ERROR: TaskFailedException

    nested task error: TaskFailedException
    
        nested task error: you fucked up!
        Stacktrace:
         [1] error(s::String)
           @ Base ./error.jl:35
         [2] macro expansion
           @ ./REPL[12]:10 [inlined]
         [3] (::var"#508#threadsfor_fun#101"{var"#508#threadsfor_fun#98#102"{UnitRange{Int64}}})(tid::Int64; onethread::Bool)
           @ Main ./threadingconstructs.jl:253
         [4] #508#threadsfor_fun
           @ ./threadingconstructs.jl:220 [inlined]
         [5] (::Base.Threads.var"#1#2"{var"#508#threadsfor_fun#101"{var"#508#threadsfor_fun#98#102"{UnitRange{Int64}}}, Int64})()
           @ Base.Threads ./threadingconstructs.jl:154
    
    ...and 7 more exceptions.
    
    Stacktrace:
     [1] threading_run(fun::var"#508#threadsfor_fun#101"{var"#508#threadsfor_fun#98#102"{UnitRange{Int64}}}, static::Bool)
       @ Base.Threads ./threadingconstructs.jl:173
     [2] macro expansion
       @ ./threadingconstructs.jl:190 [inlined]
     [3] macro expansion
       @ ./REPL[12]:5 [inlined]
     [4] (::var"#494#threadsfor_fun#99"{var"#494#threadsfor_fun#97#100"{UnitRange{Int64}}})(tid::Int64; onethread::Bool)
       @ Main ./threadingconstructs.jl:253
     [5] #494#threadsfor_fun
       @ ./threadingconstructs.jl:220 [inlined]
     [6] (::Base.Threads.var"#1#2"{var"#494#threadsfor_fun#99"{var"#494#threadsfor_fun#97#100"{UnitRange{Int64}}}, Int64})()
       @ Base.Threads ./threadingconstructs.jl:154

...and 7 more exceptions.

Stacktrace:
 [1] threading_run(fun::var"#494#threadsfor_fun#99"{var"#494#threadsfor_fun#97#100"{UnitRange{Int64}}}, static::Bool)
   @ Base.Threads ./threadingconstructs.jl:173
 [2] macro expansion
   @ ./threadingconstructs.jl:190 [inlined]
 [3] top-level scope
   @ ./REPL[12]:1

@KristofferC
Copy link
Author

How about adding something like

Note that even though this code only started directly erroring in 1.12 the code was likely already incorrect on earlier Julia versions unless it made sure that the spawned tasks did not migrate between threads (which can happen at any yield point).

to it.

@MasonProtter
Copy link
Member

yeah sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants