Vista Quickstart: Run Parallel Tasks Using PyLauncher
PyLauncher (Python + Launcher) is a job launcher and utility tool for executing large volumes of small, independent jobs in parallel. It runs as many jobs as possible on all available cores to achieve maximum throughput. PyLauncher requires that each small job be expressed as a single command line and be independent of other small jobs. Despite the name, PyLauncher does not require prior knowledge of Python to use.
In this quickstart, we will assume each small job is equivalent to a single line in a job script, such as ./myprogram value1 (note that this line can be as long as needed). To use PyLauncher, we must create a final job script that defines one small job per line, as shown below.
We will walk through a simple PyLauncher use case. TACC provides more detailed PyLauncher documentation at https://docs.tacc.utexas.edu/software/pylauncher/ for advanced uses.
- PyLauncher requires Python 3.9 or later. The default installation of Python at
/usr/bin/pythonis sufficient. We can check the version withpython --version. If, for any reason, Python is below the required version, search for a compatible version of Python withmodule spider pythonand load the right module. - PyLauncher is a Python package that is made available through the
pylauncher/5.3.1module. In your Linux shell, load this module with:module load pylauncher/5.3.1 - The Python package
paramikois required for PyLauncher. Install it with the Python Package Manager (pip):pip install paramiko
PyLauncher provides several launcher methods for different use cases. To create a PyLauncher script that is configured correctly for your application, open a new Python file in an editor, and choose your launcher method from the ones below.
-
The most basic launcher is the
ClassicLauncher:This launcher executes the list of commands (or in our example, invocations of
myprogram) inmyjobs.shsequentially, running as many lines as the available cores allow (1 per core by default), with outputs from each command stored in themyjobs_outdirectory. Note that PyLauncher cannot reuse directories, so be sure to setworkdirto a new value for each run. -
If
myprogramis multithreaded, we can specify how many cores to allocate per command inClassicLauncher: -
If
myprogramis MPI parallel, we can useIbrunLauncherinstead: -
If
myprogramuses GPU resources, we can useGPULauncher, which must specify thegpuspernodeparameter. For Vista, this value is 1:
Once you have selected a launcher, save the file as pylauncher_test.py and exit.
To run PyLauncher, submit a Slurm job script that just runs your Python script. Here is an example:
PyLauncher will then run the small jobs you listed in myjobs.sh. Note that PyLauncher ignores the flag --ntasks-per-node.
Once the job is finished, a statistics file is produced in the current working directory (not in workdir), which is useful for checking the efficiency of PyLauncher.