Metaflow comes bundled with first class support for various services on AWS. This guide gives a walk-through of how to configure Metaflow in your own AWS account.
To get Metaflow up and running in your AWS account, configuration needs to be instrumented for :
You can mix and match these services, Metaflow places no requirements that all of these be stood up. Currently, the only limitation Metaflow places is that S3 as Datastore be configured if you intend to use AWS Batch as your Compute service. The following sections walk you through how to stand up these services and configure Metaflow to use them as well as a CloudFormation template to automate this process:
S3 bucket and prefix used by Metaflow to store data artifacts and code packages.
S3 bucket and prefix used by metaflow.S3 to store data objects. Defaults to
S3 bucket and prefix used by Metaflow to store conda packages. Defaults to
To orchestrate compute on Batch, you need to minimally create a Compute Environment, a Job Queue and an IAM role that allows the Batch container to access your S3 bucket (
DeleteObject) as well as any other AWS services your user code might interface with (e.g, allowing
PassRole privileges to sagemaker services).
Batch job queue for Metaflow to place jobs on.
IAM role allowing Batch ECS tasks to access S3 and other AWS services.
Default Docker container image to execute Batch tasks on. Defaults to Python image corresponding to the
major.minor version of the Python interpreter running the flow.
Default Docker container registry to pull container images from. Defaults to docker hub.
URL to the Metaflow service.
We understand that setting up these services manually and getting all the permissions right can be complicated at times; that is why we bundle in a CloudFormation template to automate setting up these services on AWS. This template can be used both within CloudFormation as well as ServiceCatalog for provisioning resources.
Metaflow can be configured via CLI :
metaflow configure aws
Additionally, any of the specified configuration parameters can be overridden by specifying them as environment variables.