Fanning Out Workloads
You can scale out individual Python functions to many containers using the.map() method.
You might use this for parallelizing computational-heavy tasks, such as batch inference or data processing jobs.
Passing Multiple Arguments
The.map() method can also parallelize functions that require multiple parameters. Simply pass a list of tuples, where each tuple corresponds to a set of arguments for your function.
Below is an example that counts how many prime numbers appear between a start and a stop index for each tuple in ranges:
rangesis a list of tuples(start, stop).- Calling
count_primes_in_range.map(ranges)spawns a remote execution for each tuple, passing(start, stop)to the function. - Each remote call returns the number of prime numbers in that sub-range, which we print out.
.map(), Beam takes care of distributing each item (or tuple of items) to separate containers for parallel processing. This approach makes it easy to scale out CPU-heavy or data-intensive tasks with minimal code.