Skip to content

Commit

Permalink
Update to AIBECS v0.11 API (#12)
Browse files Browse the repository at this point in the history
* Update to AIBECS v0.11 API

* Update documenter version

* Add ref to AIBECS in ReadMe
  • Loading branch information
briochemc authored Dec 8, 2021
1 parent 6be396a commit 3c41182
Show file tree
Hide file tree
Showing 8 changed files with 75 additions and 80 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "F1Method"
uuid = "d5605bae-6d76-11e9-2fd7-71bcf42edbaa"
authors = ["Benoit Pasquier <[email protected]>"]
version = "0.4.1"
version = "0.5.0"

[deps]
DiffEqBase = "2b5f629d-d688-5b77-993f-72d75c75574e"
Expand Down
21 changes: 14 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,15 +57,15 @@ Optimizing the model is then simply done by minimizing `objective(p)`.

The F-1 algorithm is **easy** to use, gives **accurate** results, and is computationally **fast**:

- **Easy** — The F-1 algorithm basically just needs the user to provide a solver (for finding the steady-state), the mismatch function, `f`, the state function, `F`, and their derivatives, `∇ₓf` and `∇ₓF` w.r.t. the state `x`.
(Note these derivatives can be computed numerically, via the [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) package for example.)
- **Easy** — The F-1 algorithm basically just needs the user to provide a solver (for finding the steady-state), the mismatch function, `f`, an ODEFunction, `F` with its Jacobian, and the gradient of the objective w.r.t. `∇ₓf`.
(Note these derivatives can be computed numerically, via the [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) package for example.)
- **Accurate** — Thanks to ForwardDiff's nested dual numbers implementation, the accuracy of the gradient and Hessian, as computed by the F-1 algorithm, are close to machine precision.
- **Fast** — The F-1 algorithm is as fast as if you derived analytical formulas for every first and second derivatives *and* used those in the most efficient way.
This is because the bottleneck of such computations is the number of matrix factorizations, and the F-1 algorithm only requires a single one. In comparison, standard autodifferentiation methods that take the steady-state solver as a black box would require order `m` or `m^2` factorizations, where `m` is the number of parameters.

## What's needed?

A requirement of the F-1 algorithm is that the Jacobian matrix `A = ∇ₓf` can be created, stored, and factorized.
A requirement of the F-1 algorithm is that the Jacobian matrix `A = ∇ₓF` can be created, stored, and factorized.

To use the F-1 algorithm, the user must:

Expand All @@ -81,16 +81,16 @@ Once initial values for the state, `x`, and parameters, `p`, are chosen, simply

```julia
# Initialize the cache for storing reusable objects
mem = initialize_mem(F, ∇ₓf, ∇ₓF, x, p, alg; options...)
mem = initialize_mem(F, ∇ₓf, x, p, alg; options...)
```

wrap the functions into functions of `p` only via

```julia
# Wrap the objective, gradient, and Hessian functions
objective(p) = F1Method.objective(f, F, ∇ₓF, mem, p, alg; options...)
gradient(p) = F1Method.gradient(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...)
hessian(p) = F1Method.hessian(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...)
objective(p) = F1Method.objective(f, F, mem, p, alg; options...)
gradient(p) = F1Method.gradient(f, F, ∇ₓf, mem, p, alg; options...)
hessian(p) = F1Method.hessian(f, F, ∇ₓf, mem, p, alg; options...)
```

and compute the objective, gradient, or Hessian via either of
Expand All @@ -112,3 +112,10 @@ Now you can test how fast and accurate it is!
If you use this package, or implement your own package based on the F-1 algorithm please cite us.
If you use the F-1 algorithm, please cite *[Pasquier and Primeau](https://www.bpasquier.com/publication/pasquier_primeau_sisc_2019/)* (in prep.).
If you also use this package directly, please cite it! (Use [the Zenodo link](https://doi.org/10.5281/zenodo.2667835) or the [CITATION.bib file](./CITATION.bib), which contains a bibtex entry.)

# Future

This package is developed mainly for use with [AIBECS.jl](https://github.com/JuliaOcean/AIBECS.jl) and is likely not in its final form.
The API was just changed in v0.5 (to match the API changes in AIBECS.jl v0.11).
That being said, ultimately, it would make sense for the shortcuts used here to be integrated into a package like ChainRules.jl.
For the time being, AIBECS users can use F1Method.jl to speed up their optimizations.
2 changes: 1 addition & 1 deletion docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"

[compat]
Documenter = "~0.25"
Documenter = "~0.27"
38 changes: 17 additions & 21 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This package provides an efficient tool to compute gradient and Hessian matrix o
When using Newton-type algorithms for optimization, computing the gradient and Hessian can be computationally expensive.
A typical scientific application is to optimize the parameters of a model which solves for a root through another iterative Newton-like algorithm.
In this case, there are a number of shortcuts that can be leveraged.
This was the motivation for the work of [Pasquier et al. (2019, in preparation)](), and for this package.


## Usage

Expand All @@ -29,17 +29,15 @@ As an example, we use a simple model with only two state variables and two param

```jldoctest usage
# State function F
F(x,p) = [
statefun(x,p) = [
-2 * (p[1] - x[1]) - 4 * p[2] * (x[2] - x[1]^2) * x[1]
p[2] * (x[2] - x[1]^2)
]
# Jacobian of F wrt p
∇ₓF(x,p) = ForwardDiff.jacobian(x -> F(x,p), x)
F = ODEFunction(statefun, jac = (x,p) -> ForwardDiff.jacobian(x -> statefun(x, p), x))
# output
∇ₓF (generic function with 1 method)
(::ODEFunction{false, typeof(statefun), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}) (generic function with 7 methods)
```

We also define a cost function `f(x,p)` (that we wish to minimize under the constraint that ``\boldsymbol{F}(\boldsymbol{x},\boldsymbol{p}) = 0``).
Expand All @@ -65,7 +63,7 @@ f(x,p) = state_mismatch(x) + parameter_mismatch(p)
```

Once these are set up, we need to let the F-1 method know how to solve for the steady-state.
We do this by using the [DiffEqBase](https://github.com/JuliaDiffEq/DiffEqBase.jl) API.
We do this by using the [DiffEqBase](https://github.com/SciML/DiffEqBase.jl) API.
For that, we first write a small Newton solver algorithm, we overload the `solve` function from DiffEqBase, and we overload the `SteadyStateProblem` constructor.

```jldoctest usage
Expand All @@ -85,25 +83,17 @@ function DiffEqBase.solve(prob::DiffEqBase.AbstractSteadyStateProblem,
Ftol=1e-10)
# Define the functions according to DiffEqBase.SteadyStateProblem type
p = prob.p
t = 0
x0 = copy(prob.u0)
dx, df = copy(x0), copy(x0)
F(x) = prob.f(dx, x, p, t)
∇ₓF(x) = prob.f(df, dx, x, p, t)
F(x) = prob.f(x, p)
∇ₓF(x) = prob.f.jac(x, p)
# Compute `u_steady` and `resid` as per DiffEqBase using my algorithm
x_steady = newton_solve(F, ∇ₓF, x0, Ftol=Ftol)
resid = F(x_steady)
# Return the common DiffEqBase solution type
DiffEqBase.build_solution(prob, alg, x_steady, resid; retcode=:Success)
end
# Overload DiffEqBase's SteadyStateProblem constructor
function DiffEqBase.SteadyStateProblem(F, ∇ₓF, x, p)
f(dx, x, p, t) = F(x, p)
f(df, dx, x, p, t) = ∇ₓF(x, p)
return DiffEqBase.SteadyStateProblem(f, x, p)
end
# output
Expand All @@ -123,11 +113,11 @@ Finally, we wrap the objective, gradient, and Hessian functions defined by the F

```jldoctest usage
# Initialize the cache for storing reusable objects
mem = F1Method.initialize_mem(F, ∇ₓf, ∇ₓF, x₀, p₀, MyAlg())
mem = F1Method.initialize_mem(F, ∇ₓf, x₀, p₀, MyAlg())
# Define the functions via the F1 method
F1_objective(p) = F1Method.objective(f, F, ∇ₓF, mem, p, MyAlg())
F1_gradient(p) = F1Method.gradient(f, F, ∇ₓf, ∇ₓF, mem, p, MyAlg())
F1_Hessian(p) = F1Method.hessian(f, F, ∇ₓf, ∇ₓF, mem, p, MyAlg())
F1_objective(p) = F1Method.objective(f, F, mem, p, MyAlg())
F1_gradient(p) = F1Method.gradient(f, F, ∇ₓf, mem, p, MyAlg())
F1_Hessian(p) = F1Method.hessian(f, F, ∇ₓf, mem, p, MyAlg())
# output
Expand Down Expand Up @@ -164,3 +154,9 @@ F1_Hessian(p₀)
0.0 -0.0241434
```

# Future

This package is likely not in its final form.
The API was just changed in v0.5 (to match the API changes in AIBECS.jl v0.11).
That being said, ultimately, it would make sense for the shortcuts used here to be integrated into a package like ChainRules.jl.
For the time being, AIBECS users can use F1Method.jl to speed up their optimizations.
38 changes: 19 additions & 19 deletions src/F1Method.jl
Original file line number Diff line number Diff line change
Expand Up @@ -31,60 +31,60 @@ mutable struct Mem{Ts, TA, T∇s, T∇ₓf, Tp}
end


function update_mem!(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...)
function update_mem!(f, F, ∇ₓf, mem, p, alg; options...)
if p mem.p # only update mem if 𝒑 has changed
update_solution!(F, ∇ₓF, mem, p, alg; options...)
update_solution!(F, mem, p, alg; options...)
∇ₚF = ForwardDiff.jacobian(p -> F(mem.s, p), p)
mem.A = factorize(∇ₓF(mem.s, p)) # update factors of ∇ₓ𝑭(𝒔,𝒑)
mem.A = factorize(F.jac(mem.s, p)) # update factors of ∇ₓ𝑭(𝒔,𝒑)
mem.∇s .= mem.A \ -∇ₚF # update ∇𝒔
mem.∇ₓf .= ∇ₓf(mem.s, p) # update ∇ₓ𝑓(𝒔,𝒑)
mem.p .= p # update 𝒑 for the variables above
end
end

function update_solution!(F, ∇ₓF, mem, p, alg; options...)
function update_solution!(F, mem, p, alg; options...)
if p mem.psol
prob = SteadyStateProblem(F, ∇ₓF, mem.s, p) # define problem
prob = SteadyStateProblem(F, mem.s, p) # define problem
mem.s .= solve(prob, alg; options...).u # update 𝒔
mem.psol .= p # update 𝒑 for 𝒔
end
end

"""
objective(f, F, ∇ₓF, mem, p, alg; options...)
objective(f, F, mem, p, alg; options...)
Returns `f(x,p)` such that `F(x,p)=0` using the F-1 method.
Specifically, `objective(f, F, ∇ₓF, mem, p, alg; options...)`
Specifically, `objective(f, F, mem, p, alg; options...)`
evaluates the objective function defined by `f̂(p) = f(s(p),p)`, where
`s(p)`, which is the steady-state solution (i.e., such that `F(s(p),p)=0`)
is computed by the iterative Newton-type solver `alg`.
The Jacobian, `∇ₓF`, and the memory cache `mem` must be supplied.
The memory cache `mem` must be supplied.
"""
function objective(f, F, ∇ₓF, mem, p, alg; options...)
update_solution!(F, ∇ₓF, mem, p, alg; options...)
function objective(f, F, mem, p, alg; options...)
update_solution!(F, mem, p, alg; options...)
return f(mem.s, p)
end

"""
gradient(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...)
gradient(f, F, ∇ₓf, mem, p, alg; options...)
Returns the gradient of the `objective` function using the F-1 method.
"""
function gradient(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...)
update_mem!(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...)
function gradient(f, F, ∇ₓf, mem, p, alg; options...)
update_mem!(f, F, ∇ₓf, mem, p, alg; options...)
s, ∇s, m = mem.s, mem.∇s, length(p)
∇ₚf = ForwardDiff.jacobian(p -> [f(s,p)], p)
return mem.∇ₓf * ∇s + ∇ₚf
end

"""
hessian(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...)
hessian(f, F, ∇ₓf, mem, p, alg; options...)
Returns the Hessian of the `objective` function using the F-1 method.
"""
function hessian(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...)
update_mem!(f, F, ∇ₓf, ∇ₓF, mem, p, alg; options...)
function hessian(f, F, ∇ₓf, mem, p, alg; options...)
update_mem!(f, F, ∇ₓf, mem, p, alg; options...)
s, A, ∇s, m = mem.s, mem.A, mem.∇s, length(p)
A⁻ᵀ∇ₓfᵀ = vec(A' \ mem.∇ₓf') # independent of (𝑗,𝑘)
H(λ) = f(s+∇s*λ, p+λ) - F(s+∇s*λ, p+λ)' * A⁻ᵀ∇ₓfᵀ
Expand All @@ -96,13 +96,13 @@ end
Initializes the memory cache for the F-1 method.
"""
function initialize_mem(F, ∇ₓf, ∇ₓF, x, p, alg; options...)
function initialize_mem(F, ∇ₓf, x, p, alg; options...)
x = copy(x)
p = copy(p)
psol = copy(p)
prob = SteadyStateProblem(F, ∇ₓF, x, p)
prob = SteadyStateProblem(F, x, p)
s = solve(prob, alg; options...).u
A = factorize(∇ₓF(s, p))
A = factorize(F.jac(s, p))
∇ₚF = ForwardDiff.jacobian(p -> F(s, p), p)
return Mem(s, A, A \ -∇ₚF, ∇ₓf(s, p), p, psol)
end
Expand Down
12 changes: 6 additions & 6 deletions test/AIBECS_test.jl
Original file line number Diff line number Diff line change
Expand Up @@ -78,10 +78,10 @@ prior(::T) where {T<:AbstractParameters} = prior(T)
p = PmodelParameters()
λ = p2λ(p)
nb = sum(iswet(grd))
F, ∇ₓF = F_and_∇ₓF((T_DIP, T_POP), (G_DIP, G_POP), nb, PmodelParameters)
F = AIBECSFunction((T_DIP, T_POP), (G_DIP, G_POP), nb, PmodelParameters)
@unpack DIP_geo = p
x = DIP_geo * ones(2nb) # initial guess
prob = SteadyStateProblem(F, ∇ₓF, x, p)
prob = SteadyStateProblem(F, x, p)
sol = solve(prob, CTKAlg())

# Observations from World Ocean Atlas used to evaluate the
Expand All @@ -100,10 +100,10 @@ f, ∇ₓf = f_and_∇ₓf(ωs, ωp, grd, modify, obs, PmodelParameters)

# Now we apply the F1 method
τ = ustrip(u"s", 1e3u"Myr")
mem = F1Method.initialize_mem(F, ∇ₓf, ∇ₓF, x, λ, CTKAlg(), τstop=τ)
objective(λ) = F1Method.objective(f, F, ∇ₓF, mem, λ, CTKAlg(), τstop=τ)
gradient(λ) = F1Method.gradient(f, F, ∇ₓf, ∇ₓF, mem, λ, CTKAlg(), τstop=τ)
hessian(λ) = F1Method.hessian(f, F, ∇ₓf, ∇ₓF, mem, λ, CTKAlg(), τstop=τ)
mem = F1Method.initialize_mem(F, ∇ₓf, x, λ, CTKAlg(), τstop=τ)
objective(λ) = F1Method.objective(f, F, mem, λ, CTKAlg(), τstop=τ)
gradient(λ) = F1Method.gradient(f, F, ∇ₓf, mem, λ, CTKAlg(), τstop=τ)
hessian(λ) = F1Method.hessian(f, F, ∇ₓf, mem, λ, CTKAlg(), τstop=τ)

# Finally we test the result with the "reliable" FiniteDiff :)
@test FiniteDiff.finite_difference_gradient(objective, 2λ) gradient(2λ)' rtol=1e-3
Expand Down
16 changes: 8 additions & 8 deletions test/diffusion_problem.jl
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,11 @@ using Test, FormulaOneMethod
∇sink(x,p) = p[4] * I / n

# Define state function F(x,p) and Jacobian ∇ₓF(x,p)
F(x,p) = -T(p) * x + source(x,p) - sink(x,p)
∇ₓF(x,p) = -T(p) + ∇source(x,p) - ∇sink(x,p)
F(x,p) = ODEFunction((x,p) -> -T(p) * x + source(x,p) - sink(x,p),
jac = (x,p) -> -T(p) + ∇source(x,p) - ∇sink(x,p))

# Basic Newton solver
function newton_solve(F, ∇ₓF, x; Ftol=1e-10)
function newton_solve(F, x; Ftol=1e-10)
while norm(F(x)) Ftol
x .-= ∇ₓF(x) \ F(x)
end
Expand All @@ -78,14 +78,14 @@ using Test, FormulaOneMethod
F(x) = prob.f(dx, x, p, t)
∇ₓF(x) = prob.f(df, dx, x, p, t)
# Compute `u_steady` and `resid` as per DiffEqBase using my algorithm
x_steady = newton_solve(F, ∇ₓF, x0, Ftol=Ftol)
x_steady = newton_solve(F, x0, Ftol=Ftol)
resid = F(x_steady)
# Return the common DiffEqBase solution type
DiffEqBase.build_solution(prob, alg, x_steady, resid; retcode=:Success)
end

# Overload DiffEqBase's SteadyStateProblem constructor
function DiffEqBase.SteadyStateProblem(F, ∇ₓF, x, p)
function DiffEqBase.SteadyStateProblem(F, x, p)
f(dx, x, p, t) = F(x, p)
f(df, dx, x, p, t) = ∇ₓF(x, p)
return DiffEqBase.SteadyStateProblem(f, x, p)
Expand Down Expand Up @@ -114,15 +114,15 @@ using Test, FormulaOneMethod
mem = F1.initialize_mem(x₀, p₀)

# Compute the objective function, 𝑓̂(𝒑)
objective(p) = F1.(f, F, ∇ₓF, mem, p, MyAlg())
objective(p) = F1.(f, F, mem, p, MyAlg())
objective(p₀)

# Compute the gradient, ∇𝑓̂(𝒑)
gradient(p) = F1.∇f̂(f, F, ∇ₓf, ∇ₓF, mem, p, MyAlg())
gradient(p) = F1.∇f̂(f, F, ∇ₓf, mem, p, MyAlg())
gradient(p₀)

# Compute the Hessian matrix, ∇²𝑓̂(𝒑)
Hessian(p) = F1.∇²f̂(f, F, ∇ₓf, ∇ₓF, mem, p, MyAlg())
Hessian(p) = F1.∇²f̂(f, F, ∇ₓf, mem, p, MyAlg())
Hessian(p₀)

#end
26 changes: 9 additions & 17 deletions test/rosenbrock.jl
Original file line number Diff line number Diff line change
Expand Up @@ -30,24 +30,18 @@ function DiffEqBase.solve(prob::DiffEqBase.AbstractSteadyStateProblem,
Ftol=1e-10)
# Define the functions according to DiffEqBase.SteadyStateProblem type
p = prob.p
t = 0
x0 = copy(prob.u0)
dx, df = copy(x0), copy(x0)
F(x) = prob.f(dx, x, p, t)
∇ₓF(x) = prob.f(df, dx, x, p, t)
F(x) = prob.f(x, p)
∇ₓF(x) = prob.f.jac(x, p)
# Compute `u_steady` and `resid` as per DiffEqBase using my algorithm
x_steady = newton_solve(F, ∇ₓF, x0, Ftol=Ftol)
resid = F(x_steady)
# Return the common DiffEqBase solution type
DiffEqBase.build_solution(prob, alg, x_steady, resid; retcode=:Success)
end

# Overload DiffEqBase's SteadyStateProblem constructor
function DiffEqBase.SteadyStateProblem(F, ∇ₓF, x, p)
f(dx, x, p, t) = F(x, p)
f(df, dx, x, p, t) = ∇ₓF(x, p)
return DiffEqBase.SteadyStateProblem(f, x, p)
end



# This is quasi derived from the Rosenbrock function
Expand All @@ -62,13 +56,11 @@ end
# Specifically, if r(x,y) denotes the rosenbrock function,
# F₁([x, y], p) = ∂r/∂x
# F₂([x, y], p) = 0.5 * ∂r/∂y
F(x,p) = [
statefun(x,p) = [
-2 * (p[1] - x[1]) - 4 * p[2] * (x[2] - x[1]^2) * x[1]
p[2] * (x[2] - x[1]^2)
]

# Jacobian function of F w.r.t. p
∇ₓF(x,p) = ForwardDiff.jacobian(x -> F(x,p), x)
F = ODEFunction(statefun, jac = (x,p) -> ForwardDiff.jacobian(x -> statefun(x, p), x))

# Define mismatch function f(x,p) and its derivative ∇ₓf(x,p)
# (Note ∇ₓF and ∇ₓf are required by the F1 method)
Expand All @@ -87,12 +79,12 @@ f(x,p) = state_mismatch(x) + parameter_mismatch(p)
x₀ = rand(2)
p₀ = rand(2)
# Initialize the memory cache for storing reusable objects
mem = F1Method.initialize_mem(F, ∇ₓf, ∇ₓF, x₀, p₀, MyAlg())
mem = F1Method.initialize_mem(F, ∇ₓf, x₀, p₀, MyAlg())

# Define the functions via the F1 method
F1_objective(p) = F1Method.objective(f, F, ∇ₓF, mem, p, MyAlg())
F1_gradient(p) = F1Method.gradient(f, F, ∇ₓf, ∇ₓF, mem, p, MyAlg())
F1_Hessian(p) = F1Method.hessian(f, F, ∇ₓf, ∇ₓF, mem, p, MyAlg())
F1_objective(p) = F1Method.objective(f, F, mem, p, MyAlg())
F1_gradient(p) = F1Method.gradient(f, F, ∇ₓf, mem, p, MyAlg())
F1_Hessian(p) = F1Method.hessian(f, F, ∇ₓf, mem, p, MyAlg())

# Define the exact solution and objective for comparison
exact_solution(p) = [p[1], p[1]^2]
Expand Down

2 comments on commit 3c41182

@briochemc
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/50143

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.5.0 -m "<description of version>" 3c411823ffb8fa6d0878170eb2685bc01cfb18b3
git push origin v0.5.0

Please sign in to comment.