Skip to main content

Using GPUs and Other Accelerators

Metaflow enables access to hardware-accelerated computing, GPUs in particular, when using AWS Batch or Kubernetes. You can leverage

You can find many examples of how to use Metaflow to fine-tune LLMs and other generative AI models, as well as how to train computer vision and other deep learning models in these articles.

Using accelerators

Before you can start taking advantage of hardware-accelerated steps, you need to take care of two prerequisites:

  1. Add hardware-accelerated instances in your Metaflow stack. Take a look specific tips for AWS Batch and Kubernetes.

  2. Configure your flow to include necessary drivers and frameworks.

After this, using the accelerators in straightforward as explained below.


Don't hesitate to reach out to Metaflow Slack if you need help get started!


To use GPUs in your compute environment, use the @resources decorator to get quick access to one or more GPUs like in this example:

from metaflow import FlowSpec, step, resources

class GPUFlow(FlowSpec):

@resources(memory=32000, cpu=4, gpu=1)
def start(self):
from my_script import my_gpu_routine

def end(self):

if __name__ == '__main__':

As usual with @resources, the decorator is ignored for local runs. This allows you to develop the code locally, e.g. using GPU resources on your local workstation. To get access to the requested resources in the cloud, run the flow --with kubernetes or --with batch.

If you need more fine-grained control over what GPUs get used, use the decorators specific to compute environment: For instance, @kubernetes allows you to specify a gpu_vendor and @batch allows you to specify a queue targeting a compute environment containing specific GPUs. For more information, see guidance for AWS Batch and Kubernetes.

Using AWS Trainium and Inferentia

On AWS, you can use AWS-specific hardware accelerators, Trainium and Inferentia. For more details, see a blog post outlining them in the context of Metaflow.

When using AWS Batch, you can request the accelerators simply by defining the number of Trainium or Inferentia cores in @batch:

  • @batch(trainium=16)
  • @batch(inferentia=16)

Note that Metaflow supports distributed training over multiple Trainium instances. For detailed instructions, visit the metaflow-trainium repository.


Contact Metaflow Slack if you are interested in using Trainium of Inferentia with @kubernetes.

Using Google's Tensor Processing Units (TPUs)

Contact Metaflow Slack if you are interested in using TPUs with Metaflow in the Google cloud.

Monitoring GPU utilization

To monitor GPU devices and their utilization, add a custom card @gpu_profile in your GPU steps.