Metaflow comes bundled with first class support for various services on AWS. This guide gives a walk-through of how to configure Metaflow in your own AWS account.
To get Metaflow up and running in your AWS account, configuration needs to be instrumented for :
You can mix and match these services, Metaflow places no requirements that all of these be stood up. Currently, the only limitation Metaflow places is that S3 as Datastore be configured if you intend to use AWS Batch as your Compute service. The following sections walk you through how to stand up these services and configure Metaflow to use them as well as a CloudFormation template to automate this process:
S3 bucket and prefix used by Metaflow to store data artifacts and code packages.
S3 bucket and prefix used by metaflow.S3 to store data objects. Defaults to
S3 bucket and prefix used by Metaflow to store conda packages. Defaults to
Allows you to specify the
endpoint_url parameter for Boto (see here).
To orchestrate compute on Batch, you need to minimally create a Compute Environment, a Job Queue and an IAM role that allows the Batch container to access your S3 bucket (
DeleteObject) as well as any other AWS services your user code might interface with (e.g, allowing
PassRole privileges to sagemaker services).
Batch job queue for Metaflow to place jobs on.
IAM role allowing Batch ECS tasks to access S3 and other AWS services.
Default Docker container image to execute Batch tasks on. Defaults to Python image corresponding to the
major.minor version of the Python interpreter running the flow.
Default Docker container registry to pull container images from. Defaults to docker hub.
Select EC2 from services
Navigate to Security Groups from left pane
Select vpc that was created in the previous step from drop down
Create ECS Security Group
This security group will allow inbound api access to metadata service
Add rule to make port 8080 accessible
Record ecs security group id for use in next step
Create RDS security group
Create another security group within same vpc. This group will be used to allow postgres access from ECS (i.e. allow the metadata service to read from the DB)
Add rule to make port 5432 accessible (type postgres) and attach to ECS security group
Note: Currently we only support Postgres as the backend DB
under "Additional connectivity configuration" add the security group that was previously created (ECS -> 5432)
Finally under "Additional Configuration" at the bottom of the page. Configure the initial database name. By default the Metadata Service expects the db name to be "metaflow". Although this is configurable via environment variables.
Note: Be sure to select security group created in previous step (0.0.0.0/0 -> 8080)
URL to the Metaflow service.
We understand that setting up these services manually and getting all the permissions right can be complicated at times; that is why we bundle in a CloudFormation template to automate setting up these services on AWS. This template can be used both within CloudFormation as well as ServiceCatalog for provisioning resources.
Metaflow can be configured via CLI :
metaflow configure aws
Additionally, any of the specified configuration parameters can be overridden by specifying them as environment variables.