The Neural Network Approach: Function Approximation and Universal Approximation
Our goal is to train a neural network \(u_{NN}(x; \theta)\) to approximate the continuous solution \(u^*(x) = \sin(\pi x)\) over the interval \([0, 1]\). This is a function approximation problem.
A key theoretical result in neural networks is the Universal Approximation Theorem. In essence, it states:
Theorem (Cybenko, 1989; Hornik, 1991): A feedforward network with a single hidden layer, containing a finite number of neurons and using a non-constant, bounded, and monotonically increasing activation function (like Sigmoid or Tanh), can approximate any continuous function on a compact domain to arbitrary accuracy.
Mathematical statement: For any continuous \(f: [0,1] \to \mathbb{R}\) and \(\epsilon > 0\), there exists \(N\) and parameters such that \(|F(x) - f(x)| < \epsilon\) for all \(x \in [0,1]\).
While the original theorem had specific activation requirements, it has been extended to other common activations like ReLU in practice.
The significance of this theorem is profound: it tells us that even a relatively simple network architecture (a single hidden layer) has the theoretical capacity to learn complex, non-linear functions like \(\sin(\pi x)\), provided it has enough neurons and uses the right kind of non-linearity. We will experimentally demonstrate this capacity.
Traditional Numerical Method vs Neural Network: Discrete vs Continuous
In contrast, the Neural Network approach aims to learn a continuous function \(u_{NN}(x; \theta)\) that approximates the true solution \(u^*(x)\) over the entire domain \([0, 1]\).
- This function is parameterized by the network's weights and biases \(\theta\).
- We train the network by showing it examples of the solution at sparse points \((x_i, u_i)\) and adjusting \(\theta\) so the network's output \(u_{NN}(x_i; \theta)\) matches \(u_i\) as closely as possible.
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)