Vista Quickstart: Run Parallel Tasks Using PyLauncher

PyLauncher (Python + Launcher) is a job launcher and utility tool for executing large volumes of small, independent jobs in parallel. It runs as many jobs as possible on all available cores to achieve maximum throughput. PyLauncher requires that each small job be expressed as a single command line and be independent of other small jobs. Despite the name, PyLauncher does not require prior knowledge of Python to use.

In this quickstart, we will assume each small job is equivalent to a single line in a job script, such as ./myprogram value1 (note that this line can be as long as needed). To use PyLauncher, we must create a final job script that defines one small job per line, as shown below.

We will walk through a simple PyLauncher use case. TACC provides more detailed PyLauncher documentation at https://docs.tacc.utexas.edu/software/pylauncher/ for advanced uses.

  1. PyLauncher requires Python 3.9 or later. The default installation of Python at /usr/bin/python is sufficient. We can check the version with python --version. If, for any reason, Python is below the required version, search for a compatible version of Python with module spider python and load the right module.
  2. PyLauncher is a Python package that is made available through the pylauncher/5.3.1 module. In your Linux shell, load this module with: module load pylauncher/5.3.1
  3. The Python package paramiko is required for PyLauncher. Install it with the Python Package Manager (pip): pip install paramiko

PyLauncher provides several launcher methods for different use cases. To create a PyLauncher script that is configured correctly for your application, open a new Python file in an editor, and choose your launcher method from the ones below.

  • The most basic launcher is the ClassicLauncher:

    This launcher executes the list of commands (or in our example, invocations of myprogram) in myjobs.sh sequentially, running as many lines as the available cores allow (1 per core by default), with outputs from each command stored in the myjobs_out directory. Note that PyLauncher cannot reuse directories, so be sure to set workdir to a new value for each run.

  • If myprogram is multithreaded, we can specify how many cores to allocate per command in ClassicLauncher:

  • If myprogram is MPI parallel, we can use IbrunLauncher instead:

  • If myprogram uses GPU resources, we can use GPULauncher, which must specify the gpuspernode parameter. For Vista, this value is 1:

Once you have selected a launcher, save the file as pylauncher_test.py and exit.

To run PyLauncher, submit a Slurm job script that just runs your Python script. Here is an example:

PyLauncher will then run the small jobs you listed in myjobs.sh. Note that PyLauncher ignores the flag --ntasks-per-node.

Once the job is finished, a statistics file is produced in the current working directory (not in workdir), which is useful for checking the efficiency of PyLauncher.

©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)