Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Time Zone Cache thread-safe with lazily-allocated thread-local caches #344

Merged
merged 15 commits into from
Aug 19, 2021

Conversation

NHDaly
Copy link
Contributor

@NHDaly NHDaly commented Jun 1, 2021

Make Time Zone Cache thread-safe with lazily-allocated thread-local caches.
Each OS-thread has its own thread-local Time Zone Cache, which makes it thread-safe to construct TimeZones simultaneously from multiple threads.

Fixes #342.


I coded this through the GitHub UI during a meeting... so I haven't tested it locally. Sorry about that, but I just wanted to get this started. :)

NHDaly added 2 commits June 1, 2021 11:46
…aches

Each OS-thread has its own thread-local Time Zone Cache, which makes it thread-safe to construct TimeZones simultaneously from multiple threads.
Copy link
Contributor Author

@NHDaly NHDaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay @omus i think this is actually ready to review!

I've gotten the tests to pass locally. :) Thanks in advance!

src/types/timezone.jl Outdated Show resolved Hide resolved
@NHDaly NHDaly marked this pull request as ready for review June 2, 2021 16:38
Copy link
Member

@omus omus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to make a test to ensure we don't break thread-safety again

src/types/timezone.jl Outdated Show resolved Hide resolved
src/types/timezone.jl Outdated Show resolved Hide resolved
@NHDaly
Copy link
Contributor Author

NHDaly commented Jun 3, 2021

It would be great to make a test to ensure we don't break thread-safety again

Agreed, but i couldn't figure out how to write a test like that. It seems to be a pretty nondeterministic failure, and I didn't manage to get it to reproduce in a small unit test. Open to suggestions from others!!
@comnik: Are you able to suggest anything?

@NHDaly
Copy link
Contributor Author

NHDaly commented Jun 3, 2021

Okay, I've added a small test that fails about 75% of the time for me (when run on master; obviously never fails on this PR), and takes around 10seconds to run. Hopefully that's an acceptable tradeoff! You might have some ideas for how to increase the likelihood of failure since you know the code better, @omus

@omus
Copy link
Member

omus commented Jun 3, 2021

Thanks! I'll try to do some performance testing against #345 tonight

src/types/timezone.jl Outdated Show resolved Hide resolved
src/types/timezone.jl Outdated Show resolved Hide resolved
Copy link

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks Nathan! :)

@omus, did your testing bear out happily? :)

@omus
Copy link
Member

omus commented Jun 9, 2021

@omus, did your testing bear out happily? :)

Slowly working on them. I have found a readers-writer lock is working pretty well without adding the extra storage requirements introduced here

@omus
Copy link
Member

omus commented Jun 10, 2021

I'm good to go ahead with this implementation. Performance comparison can be found here. I had other implementations but they were also slower so I left them out. Just need some refactoring here and we can proceed.

test/thread-safety.jl Show resolved Hide resolved
src/types/timezone.jl Outdated Show resolved Hide resolved
src/tzdata/compile.jl Outdated Show resolved Hide resolved
test/types/timezone.jl Outdated Show resolved Hide resolved
@NHDaly
Copy link
Contributor Author

NHDaly commented Aug 19, 2021

Okay! :) Thanks @omus - please take another look. I've resolved the compile() cache clearing, and it was actually pretty straightforward 👍 Thanks!

Copy link
Member

@omus omus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes. Mainly waiting on CI results

src/build.jl Outdated Show resolved Hide resolved
src/types/timezone.jl Outdated Show resolved Hide resolved
NHDaly and others added 2 commits August 19, 2021 11:17
Co-authored-by: Curtis Vogt <[email protected]>
test/runtests.jl Outdated Show resolved Hide resolved
@omus
Copy link
Member

omus commented Aug 19, 2021

I think we'll need to do some special casing for Julia 1.0 support. Update: we just need to not use julia -t on Julia 1.0

@codecov-commenter
Copy link

codecov-commenter commented Aug 19, 2021

Codecov Report

Merging #344 (bcef249) into master (3e78a3e) will increase coverage by 0.05%.
The diff coverage is 94.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #344      +/-   ##
==========================================
+ Coverage   93.85%   93.90%   +0.05%     
==========================================
  Files          32       32              
  Lines        1529     1543      +14     
==========================================
+ Hits         1435     1449      +14     
  Misses         94       94              
Impacted Files Coverage Δ
src/types/timezone.jl 91.66% <92.85%> (+3.66%) ⬆️
src/TimeZones.jl 100.00% <100.00%> (ø)
src/build.jl 100.00% <100.00%> (ø)
src/tzdata/compile.jl 95.53% <100.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e78a3e...bcef249. Read the comment docs.

Copy link
Member

@omus omus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge in a couple of hours and tag a release

NHDaly added a commit to NHDaly/TimeZones.jl that referenced this pull request Aug 19, 2021
This commit improves the thread-local caching scheme introduced in
JuliaTime#344, by sharing TimeZones
across _all_ thread-local caches (so there's only ever one TimeZone
object instance created per process), and reduces redundant work caused
by multiple concurrent Tasks starting to construct the same TimeZone at
the same time.

Before this commit, multiple Tasks that try to construct the same
TimeZone on different threads would have constructed the same object
multiple times. Whereas now, they would share it, via the "Future".

------------------------------------------------------------

It's difficult to show this effect, but one way to show it is by logging
when a TimeZone is constructed, via this diff:
```julia
diff --git a/src/types/timezone.jl b/src/types/timezone.jl
index 25d36c3..1cea69e 100644
--- a/src/types/timezone.jl
+++ b/src/types/timezone.jl
@@ -68,6 +68,7 @@ function TimeZone(str::AbstractString, mask::Class=Class(:DEFAULT))
     # Note: If the class `mask` does not match the time zone we'll still load the
     # information into the cache to ensure the result is consistent.
     tz, class = get!(_tz_cache(), str) do
+        Core.println("CONSTRUCTING $str")
         tz_path = joinpath(TZData.COMPILED_DIR, split(str, "/")...)

         if isfile(tz_path)
```

Before this commit, you can see that Every thread constructs the object
twice - once before the clear and once after (total of 8 times):
```julia
julia> Threads.nthreads()
4

julia> TimeZones.TZData.compile();

julia> foo() = begin
           @sync for i in 1:20
               if (i == 10)
                   @info "---------clear-------"
                   TimeZones.TZData.compile()
               end
               Threads.@Spawn begin
                   TimeZone("US/Eastern", TimeZones.Class(:LEGACY))
               end
           end
           @info "done"
       end
foo (generic function with 1 method)

julia> @time foo()
[ Info: ---------clear-------
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
[ Info: done
  0.391298 seconds (1.51 M allocations: 64.544 MiB, 2.46% gc time, 0.00% compilation time)
```

After this commit, it's constructed only twice - once before the clear
and once after:
```julia
julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: ---------clear-------
[ Info: done
  0.414059 seconds (1.46 M allocations: 61.972 MiB, 4.55% gc time)
```

------------------------------------------------------------------

Finally, the other problem this avoids is if we ever accidentally
introduce a Task yield inside the constructor, which could happen if we
used `@info` instead of `Core.println()`, then without this PR, the old
code could potentially do _redundant work_ to construct the TimeZone
multiple times - even on the same thread - since each Task's constructor
would see that there's no TZ in the cache, start the work, and then
yield to the next Task, which would do the same, until finally they all
report their work into the cache, overwriting each other.

This is what happens if we use `@info` in the above diff, instead:

Before this commit - the results are nondeterministic, but in this run,
you can see it redundantly constructed the value all 20 times!:
```julia
julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.494492 seconds (1.55 M allocations: 66.754 MiB, 16.67% gc time)
```

After this commit, just the two we expect. 😊 :
```julia
julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.422677 seconds (1.47 M allocations: 62.228 MiB, 4.66% gc time)
```
NHDaly added a commit to NHDaly/TimeZones.jl that referenced this pull request Aug 19, 2021
This commit improves the thread-local caching scheme introduced in
JuliaTime#344, by sharing TimeZones
across _all_ thread-local caches (so there's only ever one TimeZone
object instance created per process), and reduces redundant work caused
by multiple concurrent Tasks starting to construct the same TimeZone at
the same time.

Before this commit, multiple Tasks that try to construct the same
TimeZone on different threads would have constructed the same object
multiple times. Whereas now, they would share it, via the "Future".

------------------------------------------------------------

It's difficult to show this effect, but one way to show it is by logging
when a TimeZone is constructed, via this diff:
```julia
diff --git a/src/types/timezone.jl b/src/types/timezone.jl
index 25d36c3..1cea69e 100644
--- a/src/types/timezone.jl
+++ b/src/types/timezone.jl
@@ -68,6 +68,7 @@ function TimeZone(str::AbstractString, mask::Class=Class(:DEFAULT))
     # Note: If the class `mask` does not match the time zone we'll still load the
     # information into the cache to ensure the result is consistent.
     tz, class = get!(_tz_cache(), str) do
+        Core.println("CONSTRUCTING $str")
         tz_path = joinpath(TZData.COMPILED_DIR, split(str, "/")...)

         if isfile(tz_path)
```

Before this commit, you can see that every thread constructs the object
twice - once before the clear and once after (total of 8 times):
```julia
julia> Threads.nthreads()
4

julia> TimeZones.TZData.compile();

julia> foo() = begin
           @sync for i in 1:20
               if (i == 10)
                   @info "---------clear-------"
                   TimeZones.TZData.compile()
               end
               Threads.@Spawn begin
                   TimeZone("US/Eastern", TimeZones.Class(:LEGACY))
               end
           end
           @info "done"
       end
foo (generic function with 1 method)

julia> @time foo()
[ Info: ---------clear-------
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
[ Info: done
  0.391298 seconds (1.51 M allocations: 64.544 MiB, 2.46% gc time, 0.00% compilation time)
```

After this commit, it's constructed only twice - once before the clear
and once after:
```julia
julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.414059 seconds (1.46 M allocations: 61.972 MiB, 4.55% gc time)
```

------------------------------------------------------------------

Finally, the other problem this avoids is if we ever accidentally
introduce a Task yield inside the constructor, which could happen if we
used `@info` instead of `Core.println()`, then without this PR, the old
code could potentially do _redundant work_ to construct the TimeZone
multiple times - even on the same thread - since each Task's constructor
would see that there's no TZ in the cache, start the work, and then
yield to the next Task, which would do the same, until finally they all
report their work into the cache, overwriting each other.

This is what happens if we use `@info` in the above diff, instead:

Before this commit - the results are nondeterministic, but in this run,
you can see it redundantly constructed the value all 20 times!:
```julia
julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.494492 seconds (1.55 M allocations: 66.754 MiB, 16.67% gc time)
```

After this commit, just the two we expect. 😊 :
```julia
julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.422677 seconds (1.47 M allocations: 62.228 MiB, 4.66% gc time)
```
@omus omus merged commit 31a9e26 into JuliaTime:master Aug 19, 2021
@omus
Copy link
Member

omus commented Aug 20, 2021

I was waiting to tag this due to #356 but I think I'll go ahead and make a release anyway. We can always make more :)

@NHDaly NHDaly deleted the patch-1 branch August 20, 2021 17:27
@NHDaly
Copy link
Contributor Author

NHDaly commented Aug 20, 2021

I was waiting to tag this due to #356 but I think I'll go ahead and make a release anyway. We can always make more :)

Yeah agreed, that sounds like the right move to me, too! :)

@NHDaly
Copy link
Contributor Author

NHDaly commented Aug 20, 2021

Thanks Curtis! :)

NHDaly added a commit to NHDaly/TimeZones.jl that referenced this pull request Aug 20, 2021
This commit improves the thread-local caching scheme introduced in
JuliaTime#344, by sharing TimeZones
across _all_ thread-local caches (so there's only ever one TimeZone
object instance created per process), and reduces redundant work caused
by multiple concurrent Tasks starting to construct the same TimeZone at
the same time.

Before this commit, multiple Tasks that try to construct the same
TimeZone on different threads would have constructed the same object
multiple times. Whereas now, they would share it, via the "Future".

------------------------------------------------------------

It's difficult to show this effect, but one way to show it is by logging
when a TimeZone is constructed, via this diff:
```julia
diff --git a/src/types/timezone.jl b/src/types/timezone.jl
index 25d36c3..1cea69e 100644
--- a/src/types/timezone.jl
+++ b/src/types/timezone.jl
@@ -68,6 +68,7 @@ function TimeZone(str::AbstractString, mask::Class=Class(:DEFAULT))
     # Note: If the class `mask` does not match the time zone we'll still load the
     # information into the cache to ensure the result is consistent.
     tz, class = get!(_tz_cache(), str) do
+        Core.println("CONSTRUCTING $str")
         tz_path = joinpath(TZData.COMPILED_DIR, split(str, "/")...)

         if isfile(tz_path)
```

Before this commit, you can see that every thread constructs the object
twice - once before the clear and once after (total of 8 times):
```julia
julia> Threads.nthreads()
4

julia> TimeZones.TZData.compile();

julia> foo() = begin
           @sync for i in 1:20
               if (i == 10)
                   @info "---------clear-------"
                   TimeZones.TZData.compile()
               end
               Threads.@Spawn begin
                   TimeZone("US/Eastern", TimeZones.Class(:LEGACY))
               end
           end
           @info "done"
       end
foo (generic function with 1 method)

julia> @time foo()
[ Info: ---------clear-------
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
[ Info: done
  0.391298 seconds (1.51 M allocations: 64.544 MiB, 2.46% gc time, 0.00% compilation time)
```

After this commit, it's constructed only twice - once before the clear
and once after:
```julia
julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.414059 seconds (1.46 M allocations: 61.972 MiB, 4.55% gc time)
```

------------------------------------------------------------------

Finally, the other problem this avoids is if we ever accidentally
introduce a Task yield inside the constructor, which could happen if we
used `@info` instead of `Core.println()`, then without this PR, the old
code could potentially do _redundant work_ to construct the TimeZone
multiple times - even on the same thread - since each Task's constructor
would see that there's no TZ in the cache, start the work, and then
yield to the next Task, which would do the same, until finally they all
report their work into the cache, overwriting each other.

This is what happens if we use `@info` in the above diff, instead:

Before this commit - the results are nondeterministic, but in this run,
you can see it redundantly constructed the value all 20 times!:
```julia
julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.494492 seconds (1.55 M allocations: 66.754 MiB, 16.67% gc time)
```

After this commit, just the two we expect. 😊 :
```julia
julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.422677 seconds (1.47 M allocations: 62.228 MiB, 4.66% gc time)
```
@Sacha0
Copy link

Sacha0 commented Aug 20, 2021

🎉 ! :)

kpamnany pushed a commit to RelationalAI-oss/TimeZones.jl that referenced this pull request May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UndefRefError: Race Condition parsing TimeZone from multiple threads: needs coordination
4 participants