Computing at Scale
Metaflow makes it easy to run compute in the cloud. Instead of prescribing one paradigm for all compute needs, Metaflow allows you to mix and match various patterns of scalable compute, keeping simple things simple while making advanced use cases possible.
When your needs are modest, you can run Metaflow as any Python code, such as a notebook or a local script. When you need more compute power, say to train a model on GPUs or to handle a large dataframe, you can get the job done by adding a line code. Or, you can execute even thousands of such tasks in parallel or train a large model, such as an LLM, over many GPUs.
When your needs grow, you can even mix and match various compute environments to create advanced workflows that operate across local, on-premise data centers, and various public clouds.
Below, we provide an overview of the patterns of compute that Metaflow supports with pointers for more details. Importantly, you can mix and match these patterns freely, even in a single flow.
To enable the cloud computing capabilities of Metaflow - @batch
and @kubernetes
- you need to
deploy a Metaflow stack first. To test these concepts
before deploying, try the Metaflow Sandbox.
Rapid development with local execution
When you run a flow without special decorators, e.g.
run LinearFlow
by typing python linear.py run
,
the flow runs locally on your computer like any Python script or a notebook.
This allows you to develop and test code rapidly without having to rely on any infrastructure outside your workstation.
Running your code as a flow can provide an immediate performance benefit: If your flow has branches or foreaches, Metaflow leverages multiple CPU cores to speed up compute by running parallel tasks as separate processes. In addition to Metaflow parallelizing tasks, you can speed up compute by using optimized Python libraries such as PyTorch to leverage GPUs or a library like XGBoost that can utilize multiple CPU cores.
Parallelizing Python over multiple CPU cores
If you need to execute a medium amount of compute - too much to handle in sequential Python
code but not enough to warrant parallel tasks using foreach
- Metaflow provides a helper function,
parallel_map
that parallelizes execution of a Python function over multiple CPU cores.
For instance, you can use parallel_map
to process a list of 10M items in batches of 1M items
in parallel.
Requesting compute @resources
If your job requires more resources than what is available on your workstation, e.g. more
memory or more GPUs, Metaflow makes it easy to run the task remotely on a cloud instance:
Simply annotate the step with the @resources
decorator.
In this case, Metaflow executes the task remotely in the cloud using one of the supported compute backends, AWS Batch or Kubernetes.
This is often the easiest way
to scale up compute to handle larger datasets or models. It is like getting a bigger computer with
a line of code. While larger cloud instances cost more, they are only needed for as long as a
@step
executes, so this approach can be cost-effective as well. This manner of scaling is called
vertical scaling.
Requesting GPUs and other hardware accelerators
ML/AI workloads often require hardware acceleration such as GPUs. Learn more on a dedicated page about hardware-accelerated compute.
Executing steps in parallel
If you want to execute two or more @step
s in parallel, make them branch
.
When you run a flow with branches locally, each @step
is run in a process of its own, taking
advantage of multiple CPU cores in your workstation to speed up processing. When you execute the
flow (or some of its steps) remotely, each @step
is
run in a separate container, allowing you to run even thousands of steps in parallel.
Branches come in handy in two scenarios:
You have separate operations that can be executed independently.
You want to allocate separate
@resources
(or other decorators) for different sets of data, e.g. to build a small model with CPUs and a large one with GPUs. Just create branches, each with their own set of decorators.
Running many tasks in parallel with foreach
A very common scenario in ML, AI, and data processing is to run the same operation, e.g. data transformation or model training, for each shard of data or a set of parameters.
Metaflow's foreach
is similar to
Python's built-in map
function which allows
you to apply a function - or in the case of Metaflow, a @step
- to all elements in a list.
The difference to branch
is that foreach
applies the same operation to all elements,
utilizing data parallelism, whereas branch
applies a different operation to each, utilizing
task parallelism.
The superpower of Metaflow is that you can run these tasks in parallel, processing even thousands of items concurrently in the cloud. Hence you can use foreaches to process large datasets, train many models, or run hyperparameter searches in parallel, that is, execute any embarrassingly parallel operations that can benefit from horizontal scaling.
Options for controlling parallelism
Note that regardless of the size of your list to foreach
over, you can control the number
of tasks actually run in parallel with the --max-workers
flag. Also you will want to
increase --max-num-splits
when you list is long.
Distributed computing with ephemeral compute clusters
The most advanced pattern of compute that Metaflow supports is distributed computing. In this case, Metaflow sets up a cluster of instances on the fly which can communicate with each other, e.g. to train a Large Language Model (LLM) over many GPU instances.
While there are many other ways to set up such clusters, a major benefit of Metaflow is that you can embed an ephemeral cluster as a part of a larger workflow, instead of having to maintain the cluster separately. Learn more on a dedicated page about distributed computing.