diff --git a/Project.toml b/Project.toml index f5810b7..2ce51bf 100644 --- a/Project.toml +++ b/Project.toml @@ -1,7 +1,7 @@ name = "F1Method" uuid = "d5605bae-6d76-11e9-2fd7-71bcf42edbaa" authors = ["Benoit Pasquier "] -version = "0.4.1" +version = "0.5.0" [deps] DiffEqBase = "2b5f629d-d688-5b77-993f-72d75c75574e" diff --git a/README.md b/README.md index c2ddf97..def5e54 100644 --- a/README.md +++ b/README.md @@ -57,15 +57,15 @@ Optimizing the model is then simply done by minimizing `objective(p)`. The F-1 algorithm is **easy** to use, gives **accurate** results, and is computationally **fast**: -- **Easy** — The F-1 algorithm basically just needs the user to provide a solver (for finding the steady-state), the mismatch function, `f`, the state function, `F`, and their derivatives, `∇ₓf` and `∇ₓF` w.r.t. the state `x`. - (Note these derivatives can be computed numerically, via the [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) package for example.) +- **Easy** — The F-1 algorithm basically just needs the user to provide a solver (for finding the steady-state), the mismatch function, `f`, an ODEFunction, `F` with its Jacobian, and the gradient of the objective w.r.t. `∇ₓf`. + (Note these derivatives can be computed numerically, via the [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) package for example.) - **Accurate** — Thanks to ForwardDiff's nested dual numbers implementation, the accuracy of the gradient and Hessian, as computed by the F-1 algorithm, are close to machine precision. - **Fast** — The F-1 algorithm is as fast as if you derived analytical formulas for every first and second derivatives *and* used those in the most efficient way. This is because the bottleneck of such computations is the number of matrix factorizations, and the F-1 algorithm only requires a single one. In comparison, standard autodifferentiation methods that take the steady-state solver as a black box would require order `m` or `m^2` factorizations, where `m` is the number of parameters. ## What's needed? -A requirement of the F-1 algorithm is that the Jacobian matrix `A = ∇ₓf` can be created, stored, and factorized. +A requirement of the F-1 algorithm is that the Jacobian matrix `A = ∇ₓF` can be created, stored, and factorized. To use the F-1 algorithm, the user must: @@ -81,16 +81,16 @@ Once initial values for the state, `x`, and parameters, `p`, are chosen, simply ```julia # Initialize the cache for storing reusable objects -mem = initialize_mem(F, ∇ₓf, ∇ₓF, x, p, alg; options...) +mem = initialize_mem(F, ∇ₓf, x, p, alg; options...) ``` wrap the functions into functions of `p` only via ```julia # Wrap the objective, gradient, and Hessian functions -objective(p) = F1Method.objective(f, F, ∇ₓF, mem, p, alg; options...) -gradient(p) = F1Method.gradient(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...) -hessian(p) = F1Method.hessian(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...) +objective(p) = F1Method.objective(f, F, mem, p, alg; options...) +gradient(p) = F1Method.gradient(f, F, ∇ₓf, mem, p, alg; options...) +hessian(p) = F1Method.hessian(f, F, ∇ₓf, mem, p, alg; options...) ``` and compute the objective, gradient, or Hessian via either of @@ -112,3 +112,10 @@ Now you can test how fast and accurate it is! If you use this package, or implement your own package based on the F-1 algorithm please cite us. If you use the F-1 algorithm, please cite *[Pasquier and Primeau](https://www.bpasquier.com/publication/pasquier_primeau_sisc_2019/)* (in prep.). If you also use this package directly, please cite it! (Use [the Zenodo link](https://doi.org/10.5281/zenodo.2667835) or the [CITATION.bib file](./CITATION.bib), which contains a bibtex entry.) + +# Future + +This package is developed mainly for use with [AIBECS.jl](https://github.com/JuliaOcean/AIBECS.jl) and is likely not in its final form. +The API was just changed in v0.5 (to match the API changes in AIBECS.jl v0.11). +That being said, ultimately, it would make sense for the shortcuts used here to be integrated into a package like ChainRules.jl. +For the time being, AIBECS users can use F1Method.jl to speed up their optimizations. \ No newline at end of file diff --git a/docs/Project.toml b/docs/Project.toml index c661589..9146592 100644 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -4,4 +4,4 @@ Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210" [compat] -Documenter = "~0.25" +Documenter = "~0.27" diff --git a/docs/src/index.md b/docs/src/index.md index 01a6274..131efe3 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -7,7 +7,7 @@ This package provides an efficient tool to compute gradient and Hessian matrix o When using Newton-type algorithms for optimization, computing the gradient and Hessian can be computationally expensive. A typical scientific application is to optimize the parameters of a model which solves for a root through another iterative Newton-like algorithm. In this case, there are a number of shortcuts that can be leveraged. -This was the motivation for the work of [Pasquier et al. (2019, in preparation)](), and for this package. + ## Usage @@ -29,17 +29,15 @@ As an example, we use a simple model with only two state variables and two param ```jldoctest usage # State function F -F(x,p) = [ +statefun(x,p) = [ -2 * (p[1] - x[1]) - 4 * p[2] * (x[2] - x[1]^2) * x[1] p[2] * (x[2] - x[1]^2) ] - -# Jacobian of F wrt p -∇ₓF(x,p) = ForwardDiff.jacobian(x -> F(x,p), x) +F = ODEFunction(statefun, jac = (x,p) -> ForwardDiff.jacobian(x -> statefun(x, p), x)) # output -∇ₓF (generic function with 1 method) +(::ODEFunction{false, typeof(statefun), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}) (generic function with 7 methods) ``` We also define a cost function `f(x,p)` (that we wish to minimize under the constraint that ``\boldsymbol{F}(\boldsymbol{x},\boldsymbol{p}) = 0``). @@ -65,7 +63,7 @@ f(x,p) = state_mismatch(x) + parameter_mismatch(p) ``` Once these are set up, we need to let the F-1 method know how to solve for the steady-state. -We do this by using the [DiffEqBase](https://github.com/JuliaDiffEq/DiffEqBase.jl) API. +We do this by using the [DiffEqBase](https://github.com/SciML/DiffEqBase.jl) API. For that, we first write a small Newton solver algorithm, we overload the `solve` function from DiffEqBase, and we overload the `SteadyStateProblem` constructor. ```jldoctest usage @@ -85,11 +83,10 @@ function DiffEqBase.solve(prob::DiffEqBase.AbstractSteadyStateProblem, Ftol=1e-10) # Define the functions according to DiffEqBase.SteadyStateProblem type p = prob.p - t = 0 x0 = copy(prob.u0) dx, df = copy(x0), copy(x0) - F(x) = prob.f(dx, x, p, t) - ∇ₓF(x) = prob.f(df, dx, x, p, t) + F(x) = prob.f(x, p) + ∇ₓF(x) = prob.f.jac(x, p) # Compute `u_steady` and `resid` as per DiffEqBase using my algorithm x_steady = newton_solve(F, ∇ₓF, x0, Ftol=Ftol) resid = F(x_steady) @@ -97,13 +94,6 @@ function DiffEqBase.solve(prob::DiffEqBase.AbstractSteadyStateProblem, DiffEqBase.build_solution(prob, alg, x_steady, resid; retcode=:Success) end -# Overload DiffEqBase's SteadyStateProblem constructor -function DiffEqBase.SteadyStateProblem(F, ∇ₓF, x, p) - f(dx, x, p, t) = F(x, p) - f(df, dx, x, p, t) = ∇ₓF(x, p) - return DiffEqBase.SteadyStateProblem(f, x, p) -end - # output @@ -123,11 +113,11 @@ Finally, we wrap the objective, gradient, and Hessian functions defined by the F ```jldoctest usage # Initialize the cache for storing reusable objects -mem = F1Method.initialize_mem(F, ∇ₓf, ∇ₓF, x₀, p₀, MyAlg()) +mem = F1Method.initialize_mem(F, ∇ₓf, x₀, p₀, MyAlg()) # Define the functions via the F1 method -F1_objective(p) = F1Method.objective(f, F, ∇ₓF, mem, p, MyAlg()) -F1_gradient(p) = F1Method.gradient(f, F, ∇ₓf, ∇ₓF, mem, p, MyAlg()) -F1_Hessian(p) = F1Method.hessian(f, F, ∇ₓf, ∇ₓF, mem, p, MyAlg()) +F1_objective(p) = F1Method.objective(f, F, mem, p, MyAlg()) +F1_gradient(p) = F1Method.gradient(f, F, ∇ₓf, mem, p, MyAlg()) +F1_Hessian(p) = F1Method.hessian(f, F, ∇ₓf, mem, p, MyAlg()) # output @@ -164,3 +154,9 @@ F1_Hessian(p₀) 0.0 -0.0241434 ``` +# Future + +This package is likely not in its final form. +The API was just changed in v0.5 (to match the API changes in AIBECS.jl v0.11). +That being said, ultimately, it would make sense for the shortcuts used here to be integrated into a package like ChainRules.jl. +For the time being, AIBECS users can use F1Method.jl to speed up their optimizations. \ No newline at end of file diff --git a/src/F1Method.jl b/src/F1Method.jl index 9f85e80..7183c4a 100644 --- a/src/F1Method.jl +++ b/src/F1Method.jl @@ -31,60 +31,60 @@ mutable struct Mem{Ts, TA, T∇s, T∇ₓf, Tp} end -function update_mem!(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...) +function update_mem!(f, F, ∇ₓf, mem, p, alg; options...) if p ≠ mem.p # only update mem if 𝒑 has changed - update_solution!(F, ∇ₓF, mem, p, alg; options...) + update_solution!(F, mem, p, alg; options...) ∇ₚF = ForwardDiff.jacobian(p -> F(mem.s, p), p) - mem.A = factorize(∇ₓF(mem.s, p)) # update factors of ∇ₓ𝑭(𝒔,𝒑) + mem.A = factorize(F.jac(mem.s, p)) # update factors of ∇ₓ𝑭(𝒔,𝒑) mem.∇s .= mem.A \ -∇ₚF # update ∇𝒔 mem.∇ₓf .= ∇ₓf(mem.s, p) # update ∇ₓ𝑓(𝒔,𝒑) mem.p .= p # update 𝒑 for the variables above end end -function update_solution!(F, ∇ₓF, mem, p, alg; options...) +function update_solution!(F, mem, p, alg; options...) if p ≠ mem.psol - prob = SteadyStateProblem(F, ∇ₓF, mem.s, p) # define problem + prob = SteadyStateProblem(F, mem.s, p) # define problem mem.s .= solve(prob, alg; options...).u # update 𝒔 mem.psol .= p # update 𝒑 for 𝒔 end end """ - objective(f, F, ∇ₓF, mem, p, alg; options...) + objective(f, F, mem, p, alg; options...) Returns `f(x,p)` such that `F(x,p)=0` using the F-1 method. -Specifically, `objective(f, F, ∇ₓF, mem, p, alg; options...)` +Specifically, `objective(f, F, mem, p, alg; options...)` evaluates the objective function defined by `f̂(p) = f(s(p),p)`, where `s(p)`, which is the steady-state solution (i.e., such that `F(s(p),p)=0`) is computed by the iterative Newton-type solver `alg`. -The Jacobian, `∇ₓF`, and the memory cache `mem` must be supplied. +The memory cache `mem` must be supplied. """ -function objective(f, F, ∇ₓF, mem, p, alg; options...) - update_solution!(F, ∇ₓF, mem, p, alg; options...) +function objective(f, F, mem, p, alg; options...) + update_solution!(F, mem, p, alg; options...) return f(mem.s, p) end """ - gradient(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...) + gradient(f, F, ∇ₓf, mem, p, alg; options...) Returns the gradient of the `objective` function using the F-1 method. """ -function gradient(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...) - update_mem!(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...) +function gradient(f, F, ∇ₓf, mem, p, alg; options...) + update_mem!(f, F, ∇ₓf, mem, p, alg; options...) s, ∇s, m = mem.s, mem.∇s, length(p) ∇ₚf = ForwardDiff.jacobian(p -> [f(s,p)], p) return mem.∇ₓf * ∇s + ∇ₚf end """ - hessian(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...) + hessian(f, F, ∇ₓf, mem, p, alg; options...) Returns the Hessian of the `objective` function using the F-1 method. """ -function hessian(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...) - update_mem!(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...) +function hessian(f, F, ∇ₓf, mem, p, alg; options...) + update_mem!(f, F, ∇ₓf, mem, p, alg; options...) s, A, ∇s, m = mem.s, mem.A, mem.∇s, length(p) A⁻ᵀ∇ₓfᵀ = vec(A' \ mem.∇ₓf') # independent of (𝑗,𝑘) H(λ) = f(s+∇s*λ, p+λ) - F(s+∇s*λ, p+λ)' * A⁻ᵀ∇ₓfᵀ @@ -96,13 +96,13 @@ end Initializes the memory cache for the F-1 method. """ -function initialize_mem(F, ∇ₓf, ∇ₓF, x, p, alg; options...) +function initialize_mem(F, ∇ₓf, x, p, alg; options...) x = copy(x) p = copy(p) psol = copy(p) - prob = SteadyStateProblem(F, ∇ₓF, x, p) + prob = SteadyStateProblem(F, x, p) s = solve(prob, alg; options...).u - A = factorize(∇ₓF(s, p)) + A = factorize(F.jac(s, p)) ∇ₚF = ForwardDiff.jacobian(p -> F(s, p), p) return Mem(s, A, A \ -∇ₚF, ∇ₓf(s, p), p, psol) end diff --git a/test/AIBECS_test.jl b/test/AIBECS_test.jl index 440ea1b..70642d9 100644 --- a/test/AIBECS_test.jl +++ b/test/AIBECS_test.jl @@ -78,10 +78,10 @@ prior(::T) where {T<:AbstractParameters} = prior(T) p = PmodelParameters() λ = p2λ(p) nb = sum(iswet(grd)) -F, ∇ₓF = F_and_∇ₓF((T_DIP, T_POP), (G_DIP, G_POP), nb, PmodelParameters) +F = AIBECSFunction((T_DIP, T_POP), (G_DIP, G_POP), nb, PmodelParameters) @unpack DIP_geo = p x = DIP_geo * ones(2nb) # initial guess -prob = SteadyStateProblem(F, ∇ₓF, x, p) +prob = SteadyStateProblem(F, x, p) sol = solve(prob, CTKAlg()) # Observations from World Ocean Atlas used to evaluate the @@ -100,10 +100,10 @@ f, ∇ₓf = f_and_∇ₓf(ωs, ωp, grd, modify, obs, PmodelParameters) # Now we apply the F1 method τ = ustrip(u"s", 1e3u"Myr") -mem = F1Method.initialize_mem(F, ∇ₓf, ∇ₓF, x, λ, CTKAlg(), τstop=τ) -objective(λ) = F1Method.objective(f, F, ∇ₓF, mem, λ, CTKAlg(), τstop=τ) -gradient(λ) = F1Method.gradient(f, F, ∇ₓf, ∇ₓF, mem, λ, CTKAlg(), τstop=τ) -hessian(λ) = F1Method.hessian(f, F, ∇ₓf, ∇ₓF, mem, λ, CTKAlg(), τstop=τ) +mem = F1Method.initialize_mem(F, ∇ₓf, x, λ, CTKAlg(), τstop=τ) +objective(λ) = F1Method.objective(f, F, mem, λ, CTKAlg(), τstop=τ) +gradient(λ) = F1Method.gradient(f, F, ∇ₓf, mem, λ, CTKAlg(), τstop=τ) +hessian(λ) = F1Method.hessian(f, F, ∇ₓf, mem, λ, CTKAlg(), τstop=τ) # Finally we test the result with the "reliable" FiniteDiff :) @test FiniteDiff.finite_difference_gradient(objective, 2λ) ≈ gradient(2λ)' rtol=1e-3 diff --git a/test/diffusion_problem.jl b/test/diffusion_problem.jl index c69d2cb..e7ed824 100644 --- a/test/diffusion_problem.jl +++ b/test/diffusion_problem.jl @@ -52,11 +52,11 @@ using Test, FormulaOneMethod ∇sink(x,p) = p[4] * I / n # Define state function F(x,p) and Jacobian ∇ₓF(x,p) - F(x,p) = -T(p) * x + source(x,p) - sink(x,p) - ∇ₓF(x,p) = -T(p) + ∇source(x,p) - ∇sink(x,p) + F(x,p) = ODEFunction((x,p) -> -T(p) * x + source(x,p) - sink(x,p), + jac = (x,p) -> -T(p) + ∇source(x,p) - ∇sink(x,p)) # Basic Newton solver - function newton_solve(F, ∇ₓF, x; Ftol=1e-10) + function newton_solve(F, x; Ftol=1e-10) while norm(F(x)) ≥ Ftol x .-= ∇ₓF(x) \ F(x) end @@ -78,14 +78,14 @@ using Test, FormulaOneMethod F(x) = prob.f(dx, x, p, t) ∇ₓF(x) = prob.f(df, dx, x, p, t) # Compute `u_steady` and `resid` as per DiffEqBase using my algorithm - x_steady = newton_solve(F, ∇ₓF, x0, Ftol=Ftol) + x_steady = newton_solve(F, x0, Ftol=Ftol) resid = F(x_steady) # Return the common DiffEqBase solution type DiffEqBase.build_solution(prob, alg, x_steady, resid; retcode=:Success) end # Overload DiffEqBase's SteadyStateProblem constructor - function DiffEqBase.SteadyStateProblem(F, ∇ₓF, x, p) + function DiffEqBase.SteadyStateProblem(F, x, p) f(dx, x, p, t) = F(x, p) f(df, dx, x, p, t) = ∇ₓF(x, p) return DiffEqBase.SteadyStateProblem(f, x, p) @@ -114,15 +114,15 @@ using Test, FormulaOneMethod mem = F1.initialize_mem(x₀, p₀) # Compute the objective function, 𝑓̂(𝒑) - objective(p) = F1.f̂(f, F, ∇ₓF, mem, p, MyAlg()) + objective(p) = F1.f̂(f, F, mem, p, MyAlg()) objective(p₀) # Compute the gradient, ∇𝑓̂(𝒑) - gradient(p) = F1.∇f̂(f, F, ∇ₓf, ∇ₓF, mem, p, MyAlg()) + gradient(p) = F1.∇f̂(f, F, ∇ₓf, mem, p, MyAlg()) gradient(p₀) # Compute the Hessian matrix, ∇²𝑓̂(𝒑) - Hessian(p) = F1.∇²f̂(f, F, ∇ₓf, ∇ₓF, mem, p, MyAlg()) + Hessian(p) = F1.∇²f̂(f, F, ∇ₓf, mem, p, MyAlg()) Hessian(p₀) #end diff --git a/test/rosenbrock.jl b/test/rosenbrock.jl index 74679a8..c5fe75c 100644 --- a/test/rosenbrock.jl +++ b/test/rosenbrock.jl @@ -30,11 +30,10 @@ function DiffEqBase.solve(prob::DiffEqBase.AbstractSteadyStateProblem, Ftol=1e-10) # Define the functions according to DiffEqBase.SteadyStateProblem type p = prob.p - t = 0 x0 = copy(prob.u0) dx, df = copy(x0), copy(x0) - F(x) = prob.f(dx, x, p, t) - ∇ₓF(x) = prob.f(df, dx, x, p, t) + F(x) = prob.f(x, p) + ∇ₓF(x) = prob.f.jac(x, p) # Compute `u_steady` and `resid` as per DiffEqBase using my algorithm x_steady = newton_solve(F, ∇ₓF, x0, Ftol=Ftol) resid = F(x_steady) @@ -42,12 +41,7 @@ function DiffEqBase.solve(prob::DiffEqBase.AbstractSteadyStateProblem, DiffEqBase.build_solution(prob, alg, x_steady, resid; retcode=:Success) end -# Overload DiffEqBase's SteadyStateProblem constructor -function DiffEqBase.SteadyStateProblem(F, ∇ₓF, x, p) - f(dx, x, p, t) = F(x, p) - f(df, dx, x, p, t) = ∇ₓF(x, p) - return DiffEqBase.SteadyStateProblem(f, x, p) -end + # This is quasi derived from the Rosenbrock function @@ -62,13 +56,11 @@ end # Specifically, if r(x,y) denotes the rosenbrock function, # F₁([x, y], p) = ∂r/∂x # F₂([x, y], p) = 0.5 * ∂r/∂y -F(x,p) = [ +statefun(x,p) = [ -2 * (p[1] - x[1]) - 4 * p[2] * (x[2] - x[1]^2) * x[1] p[2] * (x[2] - x[1]^2) ] - -# Jacobian function of F w.r.t. p -∇ₓF(x,p) = ForwardDiff.jacobian(x -> F(x,p), x) +F = ODEFunction(statefun, jac = (x,p) -> ForwardDiff.jacobian(x -> statefun(x, p), x)) # Define mismatch function f(x,p) and its derivative ∇ₓf(x,p) # (Note ∇ₓF and ∇ₓf are required by the F1 method) @@ -87,12 +79,12 @@ f(x,p) = state_mismatch(x) + parameter_mismatch(p) x₀ = rand(2) p₀ = rand(2) # Initialize the memory cache for storing reusable objects -mem = F1Method.initialize_mem(F, ∇ₓf, ∇ₓF, x₀, p₀, MyAlg()) +mem = F1Method.initialize_mem(F, ∇ₓf, x₀, p₀, MyAlg()) # Define the functions via the F1 method -F1_objective(p) = F1Method.objective(f, F, ∇ₓF, mem, p, MyAlg()) -F1_gradient(p) = F1Method.gradient(f, F, ∇ₓf, ∇ₓF, mem, p, MyAlg()) -F1_Hessian(p) = F1Method.hessian(f, F, ∇ₓf, ∇ₓF, mem, p, MyAlg()) +F1_objective(p) = F1Method.objective(f, F, mem, p, MyAlg()) +F1_gradient(p) = F1Method.gradient(f, F, ∇ₓf, mem, p, MyAlg()) +F1_Hessian(p) = F1Method.hessian(f, F, ∇ₓf, mem, p, MyAlg()) # Define the exact solution and objective for comparison exact_solution(p) = [p[1], p[1]^2]