Cornell Virtual Workshop > Scientific Machine Learning (SciML) > Multi-Layer Perceptron > The Neural Network Approach: Function Approximation and Universal Approximation

The Neural Network Approach: Function Approximation and Universal Approximation

Our goal is to train a neural network \(u_{NN}(x; \theta)\) to approximate the continuous solution \(u^*(x) = \sin(\pi x)\) over the interval \([0, 1]\). This is a function approximation problem.

A key theoretical result in neural networks is the Universal Approximation Theorem. In essence, it states:

Theorem (Cybenko, 1989; Hornik, 1991): A feedforward network with a single hidden layer, containing a finite number of neurons and using a non-constant, bounded, and monotonically increasing activation function (like Sigmoid or Tanh), can approximate any continuous function on a compact domain to arbitrary accuracy.

\[F(x) = \sum_{i=1}^{N} w_i \sigma(v_i x + b_i) + w_0\]

Mathematical statement: For any continuous \(f: [0,1] \to \mathbb{R}\) and \(\epsilon > 0\), there exists \(N\) and parameters such that \(|F(x) - f(x)| < \epsilon\) for all \(x \in [0,1]\).

Note:

While the original theorem had specific activation requirements, it has been extended to other common activations like ReLU in practice.

The significance of this theorem is profound: it tells us that even a relatively simple network architecture (a single hidden layer) has the theoretical capacity to learn complex, non-linear functions like \(\sin(\pi x)\), provided it has enough neurons and uses the right kind of non-linearity. We will experimentally demonstrate this capacity.

Traditional Numerical Method vs Neural Network: Discrete vs Continuous

In contrast, the Neural Network approach aims to learn a continuous function \(u_{NN}(x; \theta)\) that approximates the true solution \(u^*(x)\) over the entire domain \([0, 1]\).

This function is parameterized by the network's weights and biases \(\theta\).
We train the network by showing it examples of the solution at sparse points \((x_i, u_i)\) and adjusting \(\theta\) so the network's output \(u_{NN}(x_i; \theta)\) matches \(u_i\) as closely as possible.

Back

© | Cornell University | Center for Advanced Computing | Copyright Statement | Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)