This project implements nonlinear optimisation methods, combining analytical and numerical techniques to solve unconstrained optimisation problems. It prioritises mathematical reasoning while ensuring computational efficiency and ease of use.
The solvers:
- Use symbolic computation where possible to maintain precision.
- Transition to numerical computation for tasks like solving linear systems and matrix decompositions.
- Provide outputs consistent with established libraries, such as SciPy, for benchmarking and integration.
Developed as an educational exercise, the project aims to deepen understanding of the mathematical foundations of optimisation while creating robust, reusable tools.
-
Hybrid Analytical-Numerical Approach:
- Symbolic gradients and Hessians are used to reduce numerical inaccuracies.
- Numerical techniques ensure efficiency in solving linear systems and handling decompositions.
-
Custom Solver Classes:
- Includes Levenberg-Marquardt and Newton-Raphson methods, among others.
- Provides results compatible with SciPy's optimisation routines for easy comparison.
-
Benchmarking with Test Functions:
- Tested against well-known problems, such as Himmelblau's and Rastrigin's functions.
- Evaluates performance metrics like convergence rate, accuracy, and robustness.
-
Reproducibility:
- Features clearly documented code and examples to facilitate exploration and further development.
Scalar Fields, Functions of Two Variables
Let us consider the case where
Here, the gradient is:
and the Hessian matrix is given:
Now consider the second-order Taylor polynomial approximation
We define:
Equation
If
Here, the behavior of
- If
$\mathbf{H}(\mathbf{a})$ is positive definite, then$p_2(x, y)$ achieves a local minimum at$\mathbf{a}$ . - If
$\mathbf{H}(\mathbf{a})$ is negative definite, then$p_2(x, y)$ achieves a local maximum at$\mathbf{a}$ . - If
$\mathbf{H}(\mathbf{a})$ is indefinite,$\mathbf{a}$ is a saddle point.
To analyze
where:
-
$\mathbf{L}$ is a lower triangular matrix with ones on the diagonal. -
$\mathbf{D}$ is a diagonal matrix containing the eigenvalues (or modified pivot values) of$\mathbf{H}$ .
The signs of the entries of
The
Both NR and LM methods rely on a custom
- Match the efficiency of
scipy.linalg.ldl
while applying partial pivoting only when necessary. - Solve triangular systems using forward and backward substitution.
- Partial Pivoting: Applied conditionally to ensure numerical stability for small or negative pivots.
- Symmetry Preservation: Maintains matrix symmetry during decomposition.
- Learning-Oriented: Built to explore the fundamentals of numerical linear algebra.
This
The custom Newton-Raphson implementation leverages the full second-order information of the objective function by directly solving the system of equations formed by the gradient and Hessian. It employs
-
Second-Order Optimisation:
- Solves the Newton system
$\mathbf{H} \mathbf{d} = -\nabla f$ , where$\mathbf{H}$ is the Hessian and$\mathbf{d}$ is the search direction.
- Solves the Newton system
-
Stability with
$\mathbf{LDL}^T$ Decomposition:- Utilises a custom
$\mathbf{LDL}^T$ decomposition with conditional pivoting to handle poorly conditioned Hessians. - If
$\mathbf{H}$ is not positive definite, the Hessian is augmented as$\mathbf{H} + a\mathbf{I}$ , where$a$ is proportional to the largest entry of$\mathbf{H}$ .
- Utilises a custom
-
Convergence Criteria:
- Terminates when:
- Gradient norm:
$||\nabla f|| < \eta$ , - Step size:
$||\Delta \mathbf{x}|| < \epsilon$ , - Function value change:
$|f(\mathbf{x}_{k+1}) - f(\mathbf{x}_k)| < \delta$ .
- Gradient norm:
- Terminates when:
-
Symbolic Derivatives:
- Computes gradients and Hessians symbolically (via SymPy) to minimise numerical errors in small-scale problems.
- Evaluate the gradient (
$\nabla f$ ) and Hessian ($\mathbf{H}$ ) at the current iterate. - Decompose
$\mathbf{H}$ using$\mathbf{LDL}^T$ .- If the Hessian is not positive definite, modify it with
$\mathbf{H} + a\mathbf{I}$ and repeat decomposition.
- If the Hessian is not positive definite, modify it with
- Solve for the search direction
$\mathbf{d}$ using$\mathbf{LDL}^T$ . - Update the current point
$\mathbf{x}$ and evaluate the new function value$f(\mathbf{x})$ . - Check for convergence or continue iterating up to the maximum allowed iterations.
- Highly efficient when
$\mathbf{H}$ is well-conditioned and the initial guess is close to the solution. - Exploits second-order information to achieve rapid convergence in well-behaved regions.
- Sensitive to the quality of the initial guess and the conditioning of
$\mathbf{H}$ . - Diverges if starting far from the solution or in the presence of saddle points.
This implementation prioritises numerical stability and precision while retaining the efficiency of second-order optimisation techniques.
The custom Levenberg-Marquardt implementation adopts a trust-region approach, blending the steepest descent and Newton directions by augmenting the Hessian matrix with a damping parameter
-
Augmented Hessian:
- The Hessian is modified as
$\mathbf{H}_{\text{aug}} = \mathbf{H} + \lambda \mathbf{I}$ to stabilise the Newton step and control step size. - The damping parameter
$\lambda$ is adjusted iteratively to balance between the Gauss-Newton and gradient descent directions.
- The Hessian is modified as
-
Trust Region Management:
-
$\lambda$ is increased if the step is rejected ($\rho < 0.25$ ) and decreased if the step performs well ($\rho > 0.75$ ). -
$\rho$ : Ratio of actual reduction to predicted reduction.
-
-
Efficient Linear Solves:
- Utilises the custom
$\mathbf{LDL}^T$ decomposition to solve the linear system$\mathbf{H}_{\text{aug}} \mathbf{d} = -\nabla f$ .
- Utilises the custom
-
Convergence Criteria:
- The solver terminates when:
- Gradient norm:
$||\nabla f|| < \eta$ , - Step size:
$||\Delta x|| < \varepsilon$ , - Function value change:
$|f(x_{k+1}) - f(x_k)| < \delta$ .
- Gradient norm:
- The solver terminates when:
-
Symbolic Derivatives:
- Symbolic gradient and Hessian calculations (via SymPy) ensure precision for small-scale problems.
- Initialise
$\lambda$ and evaluate the gradient ($\nabla f$ ) and Hessian ($\mathbf{H}$ ). - Augment the Hessian (
$\mathbf{H}_{\text{aug}}$ ) and solve for the step$d$ using$\mathbf{LDL}^T$ . - Compute the actual and predicted reductions, update
$\lambda$ , and decide to accept or reject the step. - Repeat until convergence or the maximum number of iterations is reached.
- Robust for poorly conditioned problems due to adaptive damping.
- Handles flat regions effectively, avoiding divergence.
- Tracks intermediate results for analysis, including
$\lambda$ ,$\nabla f$ , and$f(x)$ .
Notebook: Testing with Challenging Functions
We now test the custom implementations of LM, and NR methods side-by-side with scipy.optimize.minimize method='BFGS'
by testing performance on a few "challenging functions".
Benchmarking Plan
- Examine contour plot to find a potential
$\mathbf{x}_0$ - Make an suitable initial guess (by inspection)
- Use NR, LM, and BFGS methods to test convergence and robustness.
- Compare:
- Number of iterations.
- Accuracy of the minimiser.
- Stability (e.g., did the method fail?).
Starting from
Method | Converged | Minimiser |
nit |
nfev |
|
---|---|---|---|---|---|
LM | Yes | [3.00000002, 1.99999994] | 9 | 10 | |
NR | Yes | [3.0, 2.0] | 0.0 | 12 | 13 |
BFGS | Yes | [2.99999994, 1.99999999] | 10 | 48 |
-
Levenberg-Marquardt:
- Minimal function evaluations with
nfev
= 10 - Final damping parameter
$\lambda = 0.00215$
- Minimal function evaluations with
-
BFGS:
- Computationally expensive with
nfev
= 48
- Computationally expensive with
-
Sensitivity to Starting Point:
- Himmelbrau's function has multiple minima, making the choice of
$\mathbf{x}_0$ critical for successful convergence to the intended minimum. - Robust methods like Levenberg-Marquardt and BFGS are better suited to handle poor initial guesses compared to Newton-Raphson.
- Himmelbrau's function has multiple minima, making the choice of
-
Challenges with Himmelbrau's Function:
- The Hessian can become poorly conditioned near saddle points, especially for Newton-Raphson, requiring careful handling of numerical stability.
- Flat regions in the function surface can also delay convergence, particularly for gradient-based methods.
Starting from
- The minimiser is
$\mathbf{x}=(0,0,0,0)$ with$f(\mathbf{x})=0$
Method | Converged | Minimiser |
nit |
nfev |
|
---|---|---|---|---|---|
LM | Yes | [9.789e-6, -9.789e-7, 4.878e-6, 4.878e-6] | 40 | 41 | |
NR | Yes | [8.278e-6, -8.278e-7, 1.324e-6, 1.324e-6] | 31 | 32 | |
BFGS | Yes | [7.615e-4, -7.616e-5, -7.624e-4, -7.624e-4] | 43 | 240 |
-
Newton-Raphson:
- Best overall performer with the fewest iterations (
nit = 31
) and function evaluations (nfev = 32
). - Efficiently handled flat regions, achieving high accuracy.
- Best overall performer with the fewest iterations (
-
Levenberg-Marquardt:
- Performed reliably with slightly more iterations (
nit = 40
) - comparable function value accuracy.
- Performed reliably with slightly more iterations (
-
BFGS:
- Converged, albeit not to the 5.d.p accuracy required.
- Significantly more function evaluations (
nfev = 240
) with a higher final function value.
- Newton-Raphson was the most efficient and accurate method for Powell's Singular Function, despite its reliance on gradients and Hessians.
- The function is convex but has flat regions in the search space, causing slow convergence.
The minimiser of this function is known to be,
Testing from three different starting points:
Starting Point | Method | Converged | Minimiser |
nit |
nfev |
|
---|---|---|---|---|---|---|
|
LM | Yes | [-0.54720, -1.54720] | -1.91322 | 8 | 9 |
NR | Yes | [-0.54720, -1.54720] | -1.91322 | 3 | 4 | |
BFGS | Yes | [-0.54720, -1.54720] | -1.91322 | 5 | 18 | |
|
LM | Yes | [-0.54720, -1.54720] | -1.91322 | 9 | 10 |
NR | Yes | [-3.68879, -4.68879] | -5.05482 | 19 | 20 | |
BFGS | Yes | [-0.54720, -1.54720] | -1.91322 | 9 | 33 | |
|
LM | Yes | [2.59440, 1.59440] | 1.22837 | 9 | 10 |
NR | Yes | [5.73599, 4.73599] | 4.36996 | 35 | 36 | |
BFGS | Yes | [2.59440, 1.59440] | 1.22837 | 7 | 24 |
-
Easy Case (
$\mathbf{x}_0 = [-0.5, -1.5]$ ):- Newton-Raphson was the fastest, converging in 3 iterations with 4 function evaluations.
- Levenberg-Marquardt and BFGS performed well, converging reliably to the global minimum.
-
Medium Case (
$\mathbf{x}_0 = [1.0, 0.5]$ ):- Newton-Raphson failed to locate the global minimum, instead converging to a spurious point with
$f(x, y) = -5.05482$ . - Levenberg-Marquardt and BFGS successfully reached the global minimum, though BFGS required more evaluations.
- Newton-Raphson failed to locate the global minimum, instead converging to a spurious point with
-
Hard Case (
$\mathbf{x}_0 = [1.5, 1.5]$ ):- Newton-Raphson diverged to a local minimum at
$f(x, y) = 4.36996$ after 35 iterations. - Levenberg-Marquardt and BFGS converged to the same local minimum.
- Newton-Raphson diverged to a local minimum at
- Newton-Raphson was the fastest for the easy case but failed completely in the medium and hard case.
- Levenberg-Marquardt showed robust and reliable performance for the easy and medium cases.
Testing from three different starting points:
Starting Point | Method | Converged | Minimiser |
nit |
nfev |
|
---|---|---|---|---|---|---|
|
LM | No | [3.00000, 0.50000] | 0.00000 | 100 | 101 |
NR | Yes | [3.00000, 0.50000] | 0.00000 | 1 | 2 | |
BFGS | Yes | [3.00000, 0.50000] | 0.00000 | 0 | 3 | |
|
LM | Yes | [3.00000, 0.50000] | 15 | 16 | |
NR | No | [186.27909, 0.99465] | 0.44380 | 100 | 101 | |
BFGS | Yes | [3.00000, 0.50000] | 15 | 54 | ||
|
LM | No | [-351.66015, 1.00281] | 0.45635 | 100 | 101 |
NR | No | [-4260.20086, 0.99982] | 43.49277 | 100 | 101 | |
BFGS | No | [-637.59666, 1.00155] | 0.45440 | 166 | 845 |
-
Easy Case;
$\mathbf{x}_0 = [3, 0.5]$ :- Newton-Raphson and BFGS converged rapidly to the exact global minimum.
- Levenberg-Marquardt failed to converge within the iteration limit (by inspection it's very nearly there)
-
Medium Case;
$\mathbf{x}_0 = [0, -1]$ :- Levenberg-Marquardt and BFGS both converged to near-zero function values, with BFGS requiring more evaluations.
- Newton-Raphson failed, diverging to an incorrect point.
-
Hard Case;
$\mathbf{x}_0 = [-2, 2]$ :- All methods struggled, with neither converging to the global minimum.
- BFGS showed poor efficiency, requiring 845 function evaluations and failing to achieve the desired tolerance.
- Newton-Raphson performs well near well-behaved regions but fails in complex scenarios.
- Levenberg-Marquardt demonstrates robustness for moderately difficult starting points but struggles in extreme cases.
- BFGS is consistent in finding minimisers but can be computationally expensive in challenging regions.
for an
We set
Testing from
Method | Converged | Minimiser |
nit |
nfev |
|
---|---|---|---|---|---|
LM | Yes | [-1.77210e-10, -1.77210e-10] | 0.00000 | 11 | 12 |
NR | Yes | [7.95923, 7.95923] | 127.35131 | 11 | 12 |
BFGS | Yes | [-7.69975e-09, -7.69975e-09] | 3 | 30 |
-
Levenberg-Marquardt (LM):
- Successfully converged to the global minimum from a relatively close starting point.
- Required 12 function evaluations.
-
Newton-Raphson (NR):
- Diverged to a saddle point (
$f(x) = 127.35131$ ), highlighting its susceptibility to poor starting points. - Equal function evaluations to LM but failed to locate the global minimum.
- Diverged to a saddle point (
-
BFGS:
- Converged efficiently to the global minimum with excellent accuracy (
$f(x) \approx 2.487 \times 10^{-14}$ ). - Required more function evaluations than LM but fewer iterations overall.
- Converged efficiently to the global minimum with excellent accuracy (
- Newton-Raphson struggles with Rastrigin's Function due to the abundance of local minima and saddle points.
- Both Levenberg-Marquardt and BFGS demonstrated robustness, with BFGS achieving the best accuracy.
- For functions with many local minima, global optimization techniques may be more appropriate.
Constrained Optimisation:
- Extend the current solvers to handle problems with equality and inequality constraints, incorporating techniques like Lagrange multipliers or barrier methods.
Large-Scale Problems:
- Adapt the solvers to efficiently handle high-dimensional problems, leveraging sparse matrix techniques and iterative methods for linear systems.