Deploying to AWS

Metaflow comes bundled with first class support for various services on AWS. This guide gives a walk-through of how to configure Metaflow in your own AWS account.

To get Metaflow up and running in your AWS account, configuration needs to be instrumented for :








Metaflow Metadata Service / RDS

You can mix and match these services, Metaflow places no requirements that all of these be stood up. Currently, the only limitation Metaflow places is that S3 as Datastore be configured if you intend to use AWS Batch as your Compute service. The following sections walk you through how to stand up these services and configure Metaflow to use them as well as a CloudFormation template to automate this process:

Datastore: S3

Metaflow stores all data artifacts, code packages and library dependencies in S3. Minimally, you need to create at least one S3 bucket to store these objects.


S3 bucket and prefix used by Metaflow to store data artifacts and code packages.


S3 bucket and prefix used by metaflow.S3 to store data objects. Defaults to $METAFLOW_DATASTORE_SYSROOT_S3/data.


S3 bucket and prefix used by Metaflow to store conda packages. Defaults to $METAFLOW_DATASTORE_SYSROOT_S3/conda.

Compute: Batch

To orchestrate compute on Batch, you need to minimally create a Compute Environment, a Job Queue and an IAM role that allows the Batch container to access your S3 bucket (ListBucket, PutObject, GetObject, DeleteObject) as well as any other AWS services your user code might interface with (e.g, allowing PassRole privileges to sagemaker services).


Batch job queue for Metaflow to place jobs on.


IAM role allowing Batch ECS tasks to access S3 and other AWS services.


Default Docker container image to execute Batch tasks on. Defaults to Python image corresponding to the major.minor version of the Python interpreter running the flow.


Default Docker container registry to pull container images from. Defaults to docker hub.

Metadata: Metaflow Service

The metaflow service is a simple aiohttp service on top of an RDS instance. To set it up, follow these steps:

Create VPC

1. Launch VPC Wizard.
2. Create a VPC with Single Public Subnet. Note the ID for subsequent steps.
3. Go to Subnets and create a second Subnet.
4. Add to the Route Table

Create RDS

1. Create DB Subnet Group
2. Create RDS instance

Create ECS Service Cluster

1. Set up a cluster using Fargate

Set up ECS Task Definition

1. Use Fargate template for Task definition
2. Note the environment variables needed.

Create Metaflow Service

1. Navigate to ECS in console, select Create from Services tab and select Task Definition created previously.
2. Select VPC created previously and add both the Subnets created previously as well. Optionally configure a LoadBalancer if needed.
3. Choose Autoscaling if needed.
4. Review and Confirm.


URL to the Metaflow service.

CloudFormation Template

We understand that setting up these services manually and getting all the permissions right can be complicated at times; that is why we bundle in a CloudFormation template to automate setting up these services on AWS. This template can be used both within CloudFormation as well as ServiceCatalog for provisioning resources.

Configuring Metaflow

Metaflow can be configured via CLI :

metaflow configure aws

Additionally, any of the specified configuration parameters can be overridden by specifying them as environment variables.