Examples, Tutorials & Datasets
There are several well-known datasets and associated deep learning problems that the community has developed over the years, which are often used by beginners to understand the basics of deep learning methods and by algorithm developers seeking to test out new methods on well-studied benchmarks. These datasets are also often used in online tutorials demonstrating deep learning methods and software. Perhaps the most well-known of these is MNIST, a dataset of images of handwritten digits (i.e., the decimal digits 0 through 9) along with associated labels, which is something like the "Hello, world" problem of machine learning. The FashionMNIST dataset is inspired by MNIST, but includes a set of labeled images of different types of clothing that are to be classified, instead of digits. Both the TensorFlow/Keras and PyTorch websites provide online tutorials using MNIST/FashionMNIST to illustrate the use of their software, such as:
- TensorFlow 2 quickstart for beginners using MNIST
- TensorFlow/Keras example of Convolutional Network (convnet) for MNIST
- TensorFlow/Keras tutorial using FashionMNIST
- PyTorch quickstart tutorial using FashionMNIST
In addition to MNIST and FashionMNIST, both TensorFlow and PyTorch include other widely used datasets that might be of interest to you in your deep learning research. In addition to the underlying data, these packages also provide convenient APIs to load the data into your programs. TensorFlow bundles up all of its prepackaged data in a single package, whereas PyTorch separates them into multiple packages based on the types of data involved:
Perhaps you already have a working deep learning code — using TensorFlow/Keras, PyTorch, or some other package — and you're mainly interested in figuring how to run your code using TACC resources. Of course, if you are working with your own data which is not already prepackaged, you will need to figure out how to import the data in the correct format for further processing. But examining some of the prepackaged data might be helpful even in that process.