Cornell Virtual Workshop > Building Scalable CNN Models > Building a CNN Classifier with PyTorch: Part 2

Downloading Dataset

We first need to get our data set:

Next, let’s define the path to train and validation sets based on the structure of the downloaded data.

Transforms

In the previous notebook, we used torchvision.transforms.Compose to apply a series of transformations to our data. Compose performed a series of two transforms:

torchvision.transforms.Resize which resizes your image to the specified dimension
torchvision.transforms.ToTensor converts you images from PIL or numpy arrays to a torch tensor.

Both of these transforms are for data preprocessing. In data preprocessing, we are preparing data into the correct format for training (i.e. ensuring dimensionality of data is correct and in the correct format).

However, PyTorch’s data transforms is also used for data augmentation where new training examples are generated by applying various transforms to your existing data. This helps to increase the size and diversity of your training set. There are several data augmentation techniques available in PyTorch. These techniques will perform operations like translating, rotating, and cropping images.

In the new data loading pipeline in this notebook we will leverage a technique called AutoAugment. This transform will augment images using a variety of augmentation techniques. Throughout the optimization procedure, AutoAugment searches for an optimal policy for augmenting images such that the performance of your model is optimized.

Note torchvision.transforms.AutoAugment() only is applied for training data in the function below.

Back

© Chishiki-AI | Cornell University | Center for Advanced Computing | Copyright Statement | Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)