Skip to content

Commit

Permalink
Reduce memory & redundant work for concurrent TimeZones construction
Browse files Browse the repository at this point in the history
This commit improves the thread-local caching scheme introduced in
JuliaTime#344, by sharing TimeZones
across _all_ thread-local caches (so there's only ever one TimeZone
object instance created per process), and reduces redundant work caused
by multiple concurrent Tasks starting to construct the same TimeZone at
the same time.

Before this commit, multiple Tasks that try to construct the same
TimeZone on different threads would have constructed the same object
multiple times. Whereas now, they would share it, via the "Future".

------------------------------------------------------------

It's difficult to show this effect, but one way to show it is by logging
when a TimeZone is constructed, via this diff:
```julia
diff --git a/src/types/timezone.jl b/src/types/timezone.jl
index 25d36c3..1cea69e 100644
--- a/src/types/timezone.jl
+++ b/src/types/timezone.jl
@@ -68,6 +68,7 @@ function TimeZone(str::AbstractString, mask::Class=Class(:DEFAULT))
     # Note: If the class `mask` does not match the time zone we'll still load the
     # information into the cache to ensure the result is consistent.
     tz, class = get!(_tz_cache(), str) do
+        Core.println("CONSTRUCTING $str")
         tz_path = joinpath(TZData.COMPILED_DIR, split(str, "/")...)

         if isfile(tz_path)
```

Before this commit, you can see that Every thread constructs the object
twice - once before the clear and once after (total of 8 times):
```julia
julia> Threads.nthreads()
4

julia> TimeZones.TZData.compile();

julia> foo() = begin
           @sync for i in 1:20
               if (i == 10)
                   @info "---------clear-------"
                   TimeZones.TZData.compile()
               end
               Threads.@Spawn begin
                   TimeZone("US/Eastern", TimeZones.Class(:LEGACY))
               end
           end
           @info "done"
       end
foo (generic function with 1 method)

julia> @time foo()
[ Info: ---------clear-------
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
[ Info: done
  0.391298 seconds (1.51 M allocations: 64.544 MiB, 2.46% gc time, 0.00% compilation time)
```

After this commit, it's constructed only twice - once before the clear
and once after:
```julia
julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: ---------clear-------
[ Info: done
  0.414059 seconds (1.46 M allocations: 61.972 MiB, 4.55% gc time)
```

------------------------------------------------------------------

Finally, the other problem this avoids is if we ever accidentally
introduce a Task yield inside the constructor, which could happen if we
used `@info` instead of `Core.println()`, then without this PR, the old
code could potentially do _redundant work_ to construct the TimeZone
multiple times - even on the same thread - since each Task's constructor
would see that there's no TZ in the cache, start the work, and then
yield to the next Task, which would do the same, until finally they all
report their work into the cache, overwriting each other.

This is what happens if we use `@info` in the above diff, instead:

Before this commit - the results are nondeterministic, but in this run,
you can see it redundantly constructed the value all 20 times!:
```julia
julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.494492 seconds (1.55 M allocations: 66.754 MiB, 16.67% gc time)
```

After this commit, just the two we expect. 😊 :
```julia
julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.422677 seconds (1.47 M allocations: 62.228 MiB, 4.66% gc time)
```
  • Loading branch information
NHDaly committed Aug 19, 2021
1 parent 86fae9d commit c113e8c
Show file tree
Hide file tree
Showing 4 changed files with 58 additions and 24 deletions.
3 changes: 2 additions & 1 deletion src/TimeZones.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
module TimeZones

using Base: @lock
using Dates
using Printf
using Serialization
Expand Down Expand Up @@ -39,7 +40,7 @@ abstract type Local <: TimeZone end

function __init__()
# Initialize the thread-local TimeZone cache (issue #342)
_reset_tz_cache()
_tz_cache_init()

# Base extension needs to happen everytime the module is loaded (issue #24)
Dates.CONVERSION_SPECIFIERS['z'] = TimeZone
Expand Down
68 changes: 53 additions & 15 deletions src/types/timezone.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@
# to the cache, while still being thread-safe.
const THREAD_TZ_CACHES = Dict{String,Tuple{TimeZone,Class}}[]

# Holding a lock during construction of a specific TimeZone prevents multiple Tasks (on the
# same or different threads) from attempting to construct the same TimeZone object, and
# allows them all to share the result.
const tz_cache_mutex = ReentrantLock()
const TZ_CACHE_FUTURES = Dict{String,Channel{Tuple{TimeZone,Class}}}() # Guarded by: tz_cache_mutex

# Based upon the thread-safe Global RNG implementation in the Random stdlib:
# https://github.com/JuliaLang/julia/blob/e4fcdf5b04fd9751ce48b0afc700330475b42443/stdlib/Random/src/RNGs.jl#L369-L385
@inline _tz_cache() = _tz_cache(Threads.threadid())
Expand All @@ -19,10 +25,22 @@ const THREAD_TZ_CACHES = Dict{String,Tuple{TimeZone,Class}}[]
end
@noinline _tz_cache_length_assert() = @assert false "0 < tid <= length(THREAD_TZ_CACHES)"

function _reset_tz_cache()
# ensures that we didn't save a bad object
function _tz_cache_init()
resize!(empty!(THREAD_TZ_CACHES), Threads.nthreads())
end
# ensures that we didn't save a bad object
function _reset_tz_cache()
# Since we use thread-local caches, we spawn a task on _each thread_ to clear that
# thread's local cache.
Threads.@threads for i in 1:Threads.nthreads()
@assert Threads.threadid() === i "TimeZones.TZData.compile() must be called from the main, top-level Task."
empty!(_tz_cache())
end
@lock tz_cache_mutex begin
empty!(TZ_CACHE_FUTURES)
end
return nothing
end

"""
TimeZone(str::AbstractString) -> TimeZone
Expand Down Expand Up @@ -68,20 +86,40 @@ function TimeZone(str::AbstractString, mask::Class=Class(:DEFAULT))
# Note: If the class `mask` does not match the time zone we'll still load the
# information into the cache to ensure the result is consistent.
tz, class = get!(_tz_cache(), str) do
tz_path = joinpath(TZData.COMPILED_DIR, split(str, "/")...)

if isfile(tz_path)
open(deserialize, tz_path, "r")
elseif occursin(FIXED_TIME_ZONE_REGEX, str)
FixedTimeZone(str), Class(:FIXED)
elseif !isdir(TZData.COMPILED_DIR) || isempty(readdir(TZData.COMPILED_DIR))
# Note: Julia 1.0 supresses the build logs which can hide issues in time zone
# compliation which result in no tzdata time zones being available.
throw(ArgumentError(
"Unable to find time zone \"$str\". Try running `TimeZones.build()`."
))
# Even though we're using Thread-local caches, we still need to lock during
# construction to prevent multiple tasks redundantly constructing the same object,
# and potential thread safety violations due to Tasks migrating threads.
# NOTE that we only grab the lock if the TZ doesn't exist, so the mutex contention
# is not on the critical path for most constructors. :)
constructing = false
# We lock the mutex, but for only a short, *constant time* duration, to grab the
# future for this TimeZone, or create the future if it doesn't exist.
future = @lock tz_cache_mutex begin
get!(TZ_CACHE_FUTURES, str) do
constructing = true
Channel{Tuple{TimeZone,Class}}(1)
end
end
if constructing
tz_path = joinpath(TZData.COMPILED_DIR, split(str, "/")...)

tz = if isfile(tz_path)
open(deserialize, tz_path, "r")
elseif occursin(FIXED_TIME_ZONE_REGEX, str)
FixedTimeZone(str), Class(:FIXED)
elseif !isdir(TZData.COMPILED_DIR) || isempty(readdir(TZData.COMPILED_DIR))
# Note: Julia 1.0 supresses the build logs which can hide issues in time zone
# compliation which result in no tzdata time zones being available.
throw(ArgumentError(
"Unable to find time zone \"$str\". Try running `TimeZones.build()`."
))
else
throw(ArgumentError("Unknown time zone \"$str\""))
end

put!(future, tz)
else
throw(ArgumentError("Unknown time zone \"$str\""))
fetch(future)
end
end

Expand Down
2 changes: 1 addition & 1 deletion src/tzdata/TZData.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
module TZData

using Printf
using ...TimeZones: DEPS_DIR
using ...TimeZones: DEPS_DIR, _reset_tz_cache

import Pkg

Expand Down
9 changes: 2 additions & 7 deletions src/tzdata/compile.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ using Dates
using Serialization
using Dates: parse_components

using ...TimeZones: _tz_cache
using ...TimeZones: _reset_tz_cache
using ...TimeZones: TimeZones, TimeZone, FixedTimeZone, VariableTimeZone, Transition, Class
using ...TimeZones: rename
using ..TZData: TimeOffset, ZERO, MIN_GMT_OFFSET, MAX_GMT_OFFSET, MIN_SAVE, MAX_SAVE,
Expand Down Expand Up @@ -696,12 +696,7 @@ function compile(tz_source::TZSource, dest_dir::AbstractString; kwargs...)
isdir(dest_dir) || error("Destination directory doesn't exist")
# When we recompile the TimeZones from a new source, we clear all the existing cached
# TimeZone objects, so that newly constructed objects pick up the newly compiled rules.
# Since we use thread-local caches, we spawn a task on _each thread_ to clear that
# thread's local cache.
Threads.@threads for i in 1:Threads.nthreads()
@assert Threads.threadid() === i "TimeZones.TZData.compile() must be called from the main, top-level Task."
empty!(_tz_cache())
end
_reset_tz_cache()

for (tz, class) in results
parts = split(TimeZones.name(tz), '/')
Expand Down

0 comments on commit c113e8c

Please sign in to comment.