Deploying to AWS

Metaflow comes bundled with first class support for various services on AWS. This guide gives a walk-through of how to configure Metaflow in your own AWS account.

To get Metaflow up and running in your AWS account, configuration needs to be instrumented for :

Service

AWS

Datastore

S3

Compute

Batch

Metadata

Metaflow Metadata Service / RDS

You can mix and match these services, Metaflow places no requirements that all of these be stood up. Currently, the only limitation Metaflow places is that S3 as Datastore be configured if you intend to use AWS Batch as your Compute service. The following sections walk you through how to stand up these services and configure Metaflow to use them as well as a CloudFormation template to automate this process:

Datastore: S3

Metaflow stores all data artifacts, code packages and library dependencies in S3. Minimally, you need to create at least one S3 bucket to store these objects.

METAFLOW_DATASTORE_SYSROOT_S3 [Required]

S3 bucket and prefix used by Metaflow to store data artifacts and code packages.

METAFLOW_DATATOOLS_S3ROOT [Optional]

S3 bucket and prefix used by metaflow.S3 to store data objects. Defaults to $METAFLOW_DATASTORE_SYSROOT_S3/data.

METAFLOW_CONDA_PACKAGE_S3ROOT [Optional]

S3 bucket and prefix used by Metaflow to store conda packages. Defaults to $METAFLOW_DATASTORE_SYSROOT_S3/conda.

METAFLOW_S3_ENDPOINT_URL [Optional]

Allows you to specify the endpoint_url parameter for Boto (see here).

Compute: Batch

To orchestrate compute on Batch, you need to minimally create a Compute Environment, a Job Queue and an IAM role that allows the Batch container to access your S3 bucket (ListBucket, PutObject, GetObject, DeleteObject) as well as any other AWS services your user code might interface with (e.g, allowing PassRole privileges to sagemaker services).

METAFLOW_BATCH_JOB_QUEUE [Required]

Batch job queue for Metaflow to place jobs on.

METAFLOW_ECS_S3_ACCESS_IAM_ROLE [Required]

IAM role allowing Batch ECS tasks to access S3 and other AWS services.

METAFLOW_BATCH_CONTAINER_IMAGE [Optional]

Default Docker container image to execute Batch tasks on. Defaults to Python image corresponding to the major.minor version of the Python interpreter running the flow.

METAFLOW_BATCH_CONTAINER_REGISTRY [Optional]

Default Docker container registry to pull container images from. Defaults to docker hub.

Metadata: Metaflow Service

The metaflow service is a simple aiohttp service on top of an RDS instance. To set it up, follow these steps:

Create VPC

1. Launch VPC Wizard.
2. Create a VPC with Single Public Subnet. Note the ID for subsequent steps.
3. Go to Subnets and create a second Subnet.
4. Add to the Route Table

Create Security Groups

Select EC2 from services

Navigate to Security Groups from left pane

Select vpc that was created in the previous step from drop down

Create ECS Security Group

This security group will allow inbound api access to metadata service

Add rule to make port 8080 accessible

Record ecs security group id for use in next step

Create RDS security group

Create another security group within same vpc. This group will be used to allow postgres access from ECS (i.e. allow the metadata service to read from the DB)

Add rule to make port 5432 accessible (type postgres) and attach to ECS security group

Create RDS

Note: Currently we only support Postgres as the backend DB

1. Create DB Subnet Group
2. Create RDS instance

under "Additional connectivity configuration" add the security group that was previously created (ECS -> 5432)

Finally under "Additional Configuration" at the bottom of the page. Configure the initial database name. By default the Metadata Service expects the db name to be "metaflow". Although this is configurable via environment variables.

Create ECS Service Cluster

1. Set up a cluster using Fargate

Set up ECS Task Definition

1. Use Fargate template for Task definition

2. Note the environment variables needed.

Create Metaflow Service

1. Navigate to ECS in console, select Create from Services tab and select Task Definition created previously.

Note: Be sure to select security group created in previous step (0.0.0.0/0 -> 8080)

2. Select VPC created previously and add both the Subnets created previously as well. Optionally configure a LoadBalancer if needed.
3. Choose Autoscaling if needed.
4. Review and Confirm.

METAFLOW_SERVICE_URL [Required]

URL to the Metaflow service.

CloudFormation Template

We understand that setting up these services manually and getting all the permissions right can be complicated at times; that is why we bundle in a CloudFormation template to automate setting up these services on AWS. This template can be used both within CloudFormation as well as ServiceCatalog for provisioning resources.

Configuring Metaflow

Metaflow can be configured via CLI :

metaflow configure aws

Additionally, any of the specified configuration parameters can be overridden by specifying them as environment variables.