Cornell Virtual Workshop > Scientific Machine Learning (SciML) > Differentiable Simulation > Advanced Optimization: Newton's Method and BFGS

Advanced Optimization: Newton's Method and BFGS

Newton's Method for Optimization

Newton's method finds minima by using second-order information:

\[ x_{k+1} = x_k - [\nabla^2 f(x_k)]^{-1} \nabla f(x_k) \]

Where:

\( x_k \) is the current estimate
\( \nabla^2 f(x_k) \) is the Hessian matrix (second derivatives)
\( \nabla f(x_k) \) is the gradient vector

Challenge: Computing and inverting the Hessian is expensive for large problems.

BFGS (Broyden-Fletcher-Goldfarb-Shanno)

BFGS approximates the inverse Hessian iteratively:

\[ H_{k+1} = (I - \rho_k s_k y_k^T) H_k (I - \rho_k y_k s_k^T) + \rho_k s_k s_k^T \]

Where:

\( s_k = x_{k+1} - x_k \) (step difference)
\( y_k = \nabla f(x_{k+1}) - \nabla f(x_k) \) (gradient difference)
\( \rho_k = \frac{1}{y_k^T s_k} \)

L-BFGS (Limited-memory BFGS) stores only recent history, making it suitable for large problems.

Adam vs L-BFGS Comparison

Aspect	Adam	L-BFGS
Type	First-order (gradients only)	Quasi-Newton (approximate second-order)
Memory	O(1) per parameter	O(m) history vectors
Convergence	Robust, good for noisy functions	Superlinear for smooth functions
Best for	Large-scale, noisy problems	Smooth, deterministic functions

Our Results: L-BFGS achieved significantly better accuracy for the smooth wave equation optimization, demonstrating the power of second-order methods for well-conditioned problems.

Back

© | Cornell University | Center for Advanced Computing | Copyright Statement | Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)