Cornell Virtual Workshop > Building Scalable CNN Models > Single Node Multi-GPU Training with torchrun

Wrap Model (3, 6B, 7A, 7B)

Wrap Model with DDP and put everything together in main function (3, 6B, 7A, 7B)

In the main function we wrap our model with ddp, use pytorch’s environment varaibles to specify our device, and create what’s needed to store and resume training at checkpoints.

Additionally, we report on the best model we find throughout training at the end of the script.

Finally, let’s run our designsafe classifier on a single node and 4 GPUs. Well start by copying the data that we need.

Then, launch the job with torchrun.

Back

© Chishiki-AI | Cornell University | Center for Advanced Computing | Copyright Statement | Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)