Advanced Optimization: Newton's Method and BFGS
Newton's Method for Optimization
Newton's method finds minima by using second-order information:
\[ x_{k+1} = x_k - [\nabla^2 f(x_k)]^{-1} \nabla f(x_k) \]
Where:
- \( x_k \) is the current estimate
- \( \nabla^2 f(x_k) \) is the Hessian matrix (second derivatives)
- \( \nabla f(x_k) \) is the gradient vector
Challenge: Computing and inverting the Hessian is expensive for large problems.
BFGS (Broyden-Fletcher-Goldfarb-Shanno)
BFGS approximates the inverse Hessian iteratively:
\[ H_{k+1} = (I - \rho_k s_k y_k^T) H_k (I - \rho_k y_k s_k^T) + \rho_k s_k s_k^T \]
Where:
- \( s_k = x_{k+1} - x_k \) (step difference)
- \( y_k = \nabla f(x_{k+1}) - \nabla f(x_k) \) (gradient difference)
- \( \rho_k = \frac{1}{y_k^T s_k} \)
L-BFGS (Limited-memory BFGS) stores only recent history, making it suitable for large problems.
Adam vs L-BFGS Comparison
| Aspect | Adam | L-BFGS |
|---|---|---|
| Type | First-order (gradients only) | Quasi-Newton (approximate second-order) |
| Memory | O(1) per parameter | O(m) history vectors |
| Convergence | Robust, good for noisy functions | Superlinear for smooth functions |
| Best for | Large-scale, noisy problems | Smooth, deterministic functions |
Our Results: L-BFGS achieved significantly better accuracy for the smooth wave equation optimization, demonstrating the power of second-order methods for well-conditioned problems.
©
|
Cornell University
|
Center for Advanced Computing
|
Copyright Statement
|
Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)