Cornell Virtual Workshop > AI with Deep Learning > Deep Learning at TACC

Running Jobs at TACC

To run your deep learning programs on TACC machines, you will have to work with the job scheduling machinery that coordinates access to computational resources. Depending on how your code is structured and what sorts of computations you might want to do, you could run your code on TACC resources in different ways. If you've written Python code to construct a deep learning pipeline and want to use it to analyze data, you will want to use the Slurm submission system to run on one of the available queues on Frontera, either by submitting batch jobs for longer production runs or by requesting an interactive node for development and testing. If, on the other hand, you have code organized in a Jupyter notebook that you'd like to run interactively to examine outputs and look at plots of your data, you will probably want to run your notebook using the TACC Analysis Portal (formerly the TACC Visualization Portal).

Here are some links to general information and resources to assist with running jobs on TACC resources, including using the Lmod module system, submitting jobs through slurm, and accessing the TACC Analysis Portal:

Notes on TACC partitions, service units, and GPUs

As described in the material linked above, running on TACC machines requires an allocation providing some number of Service Units (SUs). One unadjusted SU represents the use of a single compute node for one hour (a node-hour), although running on specialized queues can incur additional charges. This is reflected in the billing rate:

SUs billed = (number of nodes) \times (job duration in wall clock hours) \times (charge rate per node-hour)

For the rtx and rtx-dev queues on Frontera, there is a charge rate per node-hour of 3 SUs. The GPU-nodes on Frontera contain 4 GPUs per CPU node, and TACC does not implement node-sharing on any compute resource. Therefore, for each GPU-enabled compute node that you request, you will be charged the additional SU factor even if you are not using all of the attached GPUs.

In order for you to efficiently utilize the GPU resources that you are being billed for, you will ideally want to use all 4 GPUs per CPU node that you are allocated. There are some different ways to do this. One approach is to utilize multi-GPU distributed training as discussed elsewhere in this tutorial. Another approach would be to run multiple single-GPU jobs on your CPU node, but taking care to assign each of the jobs to a different attached GPU. This can be achieved using the environment variable CUDA_VISIBLE_DEVICES, which is used to specify which GPU to run a job on. On Frontera, the 4 GPUs are numbered 0,1,2,3, respectively. So to run 4 different single-GPU jobs, you could execute something like:

CUDA_VISIBLE_DEVICES=0 python3 my_deep_learning_code0.py &
CUDA_VISIBLE_DEVICES=1 python3 my_deep_learning_code1.py &
CUDA_VISIBLE_DEVICES=2 python3 my_deep_learning_code2.py &
CUDA_VISIBLE_DEVICES=3 python3 my_deep_learning_code3.py &

Back