Parallelizing Functions
How to parallelize your functions
Fanning Out Workloads
You can scale out individual Python functions to many containers using the .map()
method.
You might use this for parallelizing computational-heavy tasks, such as batch inference or data processing jobs.
When we run this Python module, 10 containers will be spawned to run the workload:
Passing Multiple Arguments
The .map()
method can also parallelize functions that require multiple parameters. Simply pass a list of tuples, where each tuple corresponds to a set of arguments for your function.
Below is an example that counts how many prime numbers appear between a start and a stop index for each tuple in ranges:
In this example:
ranges
is a list of tuples(start, stop)
.- Calling
count_primes_in_range.map(ranges)
spawns a remote execution for each tuple, passing(start, stop)
to the function. - Each remote call returns the number of prime numbers in that sub-range, which we print out.
With .map()
, Beam takes care of distributing each item (or tuple of items) to separate containers for parallel processing. This approach makes it easy to scale out CPU-heavy or data-intensive tasks with minimal code.
Was this page helpful?