Multiprocessing

What Is Multiprocessing?

Multiprocessing is a form of parallelism that allows a program to execute multiple tasks simultaneously by using multiple CPU cores. Unlike Multithreading (which shares the same memory space), multiprocessing involves separate memory spaces, each with its own Python interpreter. This avoids Python’s Global Interpreter Lock (GIL), making it well-suited for CPU-bound tasks.

Why Multiprocessing in Python?

Python’s GIL prevents true multithreading in CPython when performing CPU-bound operations. Multiprocessing sidesteps this by creating independent processes, thus taking advantage of multiple CPU cores.

Python provides this functionality via the multiprocessing module in the standard library.

How Is It Done in Python?

1. Basic Usage: `multiprocessing.Process`

from multiprocessing import Process
 
def task():
    print("Running in a separate process")
 
p = Process(target=task)
p.start()
p.join()

Each Process runs in its own memory space and executes independently.

2. Using `multiprocessing.Pool` (simplified parallel mapping)

from multiprocessing import Pool
 
def square(x):
    return x * x
 
with Pool(processes=4) as pool:
    results = pool.map(square, [1, 2, 3, 4, 5])
    print(results)  # Output: [1, 4, 9, 16, 25]

Pool is useful when you want to apply a function to a collection of data in parallel.

3. Using `multiprocessing.Queue` and `multiprocessing.Pipe` for Interprocess Communication (IPC)

from multiprocessing import Process, Queue
 
def f(q):
    q.put('Hello from process')
 
q = Queue()
p = Process(target=f, args=(q,))
p.start()
print(q.get())  # prints 'Hello from process'
p.join()

4. Shared Memory via `Value` or `Array`

from multiprocessing import Process, Value
 
def f(n):
    n.value += 1
 
num = Value('i', 0)
p = Process(target=f, args=(num,))
p.start()
p.join()
print(num.value)  # 1

Use Cases of Multiprocessing

1. CPU-Bound Tasks

Tasks that require intensive computation:

Image or video processing
Mathematical simulations
Data transformations or encodings
Cryptographic computations

2. Batch Processing of Independent Jobs

Applying a model to independent chunks of data
OCR on many files (e.g., with EasyOCR)

3. Pipeline Stages in Data Processing

Different processes handling different stages:

Process 1: Reading data
Process 2: Transformation
Process 3: Writing to disk or DB

This can be coordinated using Queues or Pipes.

When Not to Use It

When the task is I/O-bound → prefer asyncio or multithreading.
When overhead from starting processes outweighs benefits (e.g., tiny jobs).
When memory sharing is important — inter-process communication is more complex than threads.

joblib: High-level interface, useful in scikit-learn.
concurrent.futures.ProcessPoolExecutor: Modern interface for multiprocessing.
ray, dask, loky: Advanced distributed and parallel processing.

Data Archive

Explorer

Multiprocessing

What Is Multiprocessing?

Why Multiprocessing in Python?

How Is It Done in Python?

1. Basic Usage: `multiprocessing.Process`

2. Using `multiprocessing.Pool` (simplified parallel mapping)

3. Using `multiprocessing.Queue` and `multiprocessing.Pipe` for Interprocess Communication (IPC)

4. Shared Memory via `Value` or `Array`

Use Cases of Multiprocessing

1. CPU-Bound Tasks

2. Batch Processing of Independent Jobs

3. Pipeline Stages in Data Processing

When Not to Use It

Backlinks

Explorer

Data Archive

Explorer

Multiprocessing

What Is Multiprocessing?

Why Multiprocessing in Python?

How Is It Done in Python?

1. Basic Usage: multiprocessing.Process

2. Using multiprocessing.Pool (simplified parallel mapping)

3. Using multiprocessing.Queue and multiprocessing.Pipe for Interprocess Communication (IPC)

4. Shared Memory via Value or Array

Use Cases of Multiprocessing

1. CPU-Bound Tasks

2. Batch Processing of Independent Jobs

3. Pipeline Stages in Data Processing

When Not to Use It

Related Tools and Libraries

Backlinks

Explorer

1. Basic Usage: `multiprocessing.Process`

2. Using `multiprocessing.Pool` (simplified parallel mapping)

3. Using `multiprocessing.Queue` and `multiprocessing.Pipe` for Interprocess Communication (IPC)

4. Shared Memory via `Value` or `Array`