Multiprocessing
The default CPython interpreter is single-threaded by design. That is because it uses a mechanism called a Global Interpreter Lock (or GIL), which allows it to run faster and to integrate with C libraries which are not necessarily thread-safe.
While Python comes with a threading module that allows you to schedule execution of different functions as though they were separate threads, during execution in CPython, they run serially. And any threading module in any language cannot coordinate processes across compute nodes; threads can only be coordinated on a single node.
A smart solution to this problem is the multiprocessing module, which provides a thread-like interface to multiple Python processes. While process creation is more expensive than thread creation, and communication through mapped memory is slower than a thread’s communication through shared memory space, the multiprocessing API is still an efficient way to harness multiple processor cores for most algorithms.
The multiprocessing module is part of the Python Standard Library. It offers a number of different interfaces. One is a manager-worker paradigm, using the map() function. First you define a Pool of workers, then you execute a function collectively on the Pool, with syntax like this:
The above code fails if the instantiation of p
is moved before the definition of f
, presumably because child processes that are started by the Pool cannot import a function that isn't yet defined in the parent process.
The Process class lets you do more free-form communication among the processes.
- A Pipe is a file-like connection between two processes, with send() and recv().
- A Queue is a thread-safe queue shareable between two processes for very simple push-get communication.
- Specific values can be specified as living in shared memory among processes.
- There are semaphores and condition variables.
It is a fairly complete environment. It has roughly a Pthreads interface if you want to use it that way, and the library contains enough to run much of what you would write multithreaded.
An alternative that may be simpler to get started with, but is more limited in what it can do, is the concurrent.futures module. It too provides a high-level interface for asynchronous execution, either through a pool of processes or a pool of threads.