Fanning Out Workloads
You can scale out individual Python functions to many containers using the.map()
method.
You might use this for parallelizing computational-heavy tasks, such as batch inference or data processing jobs.
Passing Multiple Arguments
The.map()
method can also parallelize functions that require multiple parameters. Simply pass a list of tuples, where each tuple corresponds to a set of arguments for your function.
Below is an example that counts how many prime numbers appear between a start and a stop index for each tuple in ranges:
ranges
is a list of tuples(start, stop)
.- Calling
count_primes_in_range.map(ranges)
spawns a remote execution for each tuple, passing(start, stop)
to the function. - Each remote call returns the number of prime numbers in that sub-range, which we print out.
.map()
, Beam takes care of distributing each item (or tuple of items) to separate containers for parallel processing. This approach makes it easy to scale out CPU-heavy or data-intensive tasks with minimal code.