Because deep learning (DL) is a type of machine learning (ML), much of what was described in Concepts in Machine Learning serves as a prologue to the more specific overview of concepts in deep learning provided here. In what follows, we will elaborate on bit more on specific aspects of deep learning algorithms and pipelines.

Neural networks

Neural networks are flexible computational models that can process inputs and produce outputs, with parameters that can be fit or estimated based on data. They represent a type of computational graph inspired by the action of neurons in the brain. The structure of these models reflects a directed network of artificial "neurons" or nodes, in which data are presented as inputs to a network, those inputs are fed through the neural network architecture — which includes one or more hidden layers — and which ultimately produces outputs. This general architecture is represented schematically in the left panel of the figure below. It is conventional to represent neural networks as flowing from left to right, with the input nodes on the far left and the output nodes on the far right. Within the illustration of an entire neural network, we highlight a subset of nodes and edges in red.

Just as the network as a whole includes a set of inputs and outputs, so does each individual node in the network, as illustrated in the right panel of the figure below. Each edge that connects two nodes carries with it an associated weight, indicating the strength of influence between the two nodes. In the network depicted below, for each node j in layer k+1, a set of incoming inputs xjk are multiplied by the corresponding weights Wi,jk along each edge, and those products are summed. Finally, a bias or offset bj is added to the sum to define an aggregated, weighted input for that node. Computationally, this is accomplished as a matrix-vector multiplication involving the weight matrix W¯k and the node vector xk, followed by the addition of the bias vector bk. The value of that node xjk+1 is computed by applying an activation function f to the aggregated input. That output xjk+1 then serves as an input to downstream nodes in the network. The right panel of the figure below illustrates this action for one node in the network, but all other nodes in the network are updated simultaneously in an analogous manner, each with its own set of inputs and edge weights. Inputs are mapped through all the internal (hidden) layers of the network until an output is produced at the end.

Neural network schematic
Schematic illustration of a neural network (left) and a detailed zoom-in depicting the update for an individual node in the network (right)

The action of the network is parameterized by the edge weights W¯ and the biases b. Depending on the numerical values of those parameters, the outputs computed by the network, for a given set of inputs, can vary. The "learning" that is at the core of deep learning is the determination of suitable values for those numerical parameters that do a good job of producing a mapping (function) between the inputs and the outputs. One property that makes neural networks very powerful in this task is that they are universal functional approximators: with a sufficient number of nodes, a neural network can approximate any sufficiently smooth function that maps inputs to outputs.

A model is trained — that is, its parameters are estimated from data — by making iterative modifications to the parameter values in order to reduce the loss associated with the current set of parameter values. This is typical of many optimization algorithms in scientific computing, where one wishes to move "downhill" along the loss surface. With neural networks, deciding how specifically to change the values of parameters in order to move most quickly downhill is aided by a process known as backpropagation. Leveraging the magic of calculus and a computational technique known as automatic differentiation, backpropagation enables a neural network not just to compute the value of the function encoded by the network, but also the gradient of the function with respect to parameter values! This automatic differentiability of neural network models is one of the key features that makes them an excellent platform for machine learning.

Designing an appropriate network architecture for a given problem is itself an art and a science, the details of which are beyond the scope of this tutorial. For certain classes of problems, however, there are architectural elements that are regularly used, such as convolutional layers for image processing applications and transformer architectures for large language models. We discuss this in a bit more detail on the following page.

 
© 2025  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Access Statement