line drawing of a padlock

Aside from scheduling and queuing jobs from multiple users in order to fairly divide work among the compute nodes of a cluster, Slurm provides additional measures to assure that jobs and users do not interfere with one another. Perhaps the most useful to understand are features related to SSH access, job cleanup, and networking.

SSH Access

Slurm provides an optional Pluggable Authentication Module (PAM) to allow logins to compute nodes under certain circumstances. Policies can be implemented that allow users to ssh into nodes allocated to their own jobs, while denying access to nodes allocated to other users. In other words, if you attempt to ssh into a compute node that is assigned to someone else's job, you will be denied; if you attempt to ssh into compute nodes that are running your job(s), you will succeed.

SSH can be a useful tool for interacting with running jobs. When invoked from within a batch script, ssh can be used to execute commands on particular nodes within the allocation. Tools such as the TACC launcher are based on this principle. Likewise, it is possible to manually ssh into allocated nodes to access the node in an interactive shell. This can be useful for inspecting the state of a node running a job (e.g. with the top utility). CPU usage, I/O waiting, and other characteristics can be quickly and informally observed this way.

Job Cleanup

After a job completes, Slurm may be configured to clean up each node of the allocation by running an epilog script as root. Many clusters leverage this ability in order to clean up the /tmp storage on a node and running processes once a job terminates. Any processes owned by the previous user will be terminated (even those not directly initiated by Slurm, including independent ssh sessions described above).

Networking

While SSH access is restricted via PAM, Slurm has no influence over arbitrary communication ports that might be opened by applications that are running as part of a job. The remote desktop application VNC is a good example of a convenient tool that can open a port for incoming connections. Rather than have security depend entirely on the ability of applications like VNC to manage connections, Stampede2 and Frontera impose networking rules that disallow any direct network connections between the compute nodes and the outside world. If it is necessary to connect to a compute node from outside Stampede2 or Frontera, SSH tunneling must be used: first to a login node, then to the desired compute node.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement