Newton's Method for Optimization

Newton's method finds minima by using second-order information:

\[ x_{k+1} = x_k - [\nabla^2 f(x_k)]^{-1} \nabla f(x_k) \]

Where:

  • \( x_k \) is the current estimate
  • \( \nabla^2 f(x_k) \) is the Hessian matrix (second derivatives)
  • \( \nabla f(x_k) \) is the gradient vector

Challenge: Computing and inverting the Hessian is expensive for large problems.

BFGS (Broyden-Fletcher-Goldfarb-Shanno)

BFGS approximates the inverse Hessian iteratively:

\[ H_{k+1} = (I - \rho_k s_k y_k^T) H_k (I - \rho_k y_k s_k^T) + \rho_k s_k s_k^T \]

Where:

  • \( s_k = x_{k+1} - x_k \) (step difference)
  • \( y_k = \nabla f(x_{k+1}) - \nabla f(x_k) \) (gradient difference)
  • \( \rho_k = \frac{1}{y_k^T s_k} \)

L-BFGS (Limited-memory BFGS) stores only recent history, making it suitable for large problems.

Adam vs L-BFGS Comparison

Aspect Adam L-BFGS
Type First-order (gradients only) Quasi-Newton (approximate second-order)
Memory O(1) per parameter O(m) history vectors
Convergence Robust, good for noisy functions Superlinear for smooth functions
Best for Large-scale, noisy problems Smooth, deterministic functions

Our Results: L-BFGS achieved significantly better accuracy for the smooth wave equation optimization, demonstrating the power of second-order methods for well-conditioned problems.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)