Skip to main content

Accessing Secrets

If your flow needs to access an external service (e.g. a database) that requires authentication, you need to supply credentials to the flow. If security wasn't a concern, you could easily achieve this using Metaflow parameters. However, when it comes to credentials and other sensitive information, security is a top concern.

The industry-standard best practice is to store credentials in a secrets manager, such as AWS Secrets Manager. Once secrets are managed by such a system, Metaflow provides a decorator, @secrets, which makes it easy to access them securely in a flow.

For more background, see the @secrets launch blog post. Also, take a look at the API docs for @secrets.

info

Currently, @secrets supports only AWS Secrets Manager. Contact us on Metaflow support Slack if you are interested in using another secrets manager.

Basic usage

This video gives a one-minute overview of how to store a secret and access it in Metaflow (no sound):


The secrets manager stores secrets as a set of key-value pairs that are identified by a name. Given a name, the @secrets decorator fetches the key-value pairs - assuming your IAM user is allowed to read the secret - and exposes them through environment variables.

Here is the simple example featured in the video:

from metaflow import FlowSpec, step, secrets
import os

class SecretFlow(FlowSpec):

@secrets(sources=['metaflow-example-password'])
@step
def start(self):
print("Here's the password:", os.environ['password'])
self.next(self.end)

@step
def end(self):
pass

if __name__ == '__main__':
SecretFlow()

In this case, metaflow-example-password is the name of the secret which contains a key password. The sources attribute, which defines the secret sources, could contain multiple names, in which case the union of all secret sets is exposed through environment variables.

Configuring a secrets backend

To use @secrets, you need to inform Metaflow which secrets manager you want to use. Currently, the choice is easy since the only supported backend is AWS Secrets Manager.

Make sure your Metaflow configuration contains the following line:

"METAFLOW_DEFAULT_SECRETS_BACKEND_TYPE": "aws-secrets-manager"

Defining secrets on the command line

Note that you can define @secrets on the command line using the --with option like any other decorator. This comes especially handy when moving between prototype and production: For instance, you can access a different database during development and production.

Consider this example that connects to a Postgres database:

from metaflow import FlowSpec, step, secrets
import os
from psycopg import connect

class DBFlow(FlowSpec):

@step
def start(self):
with connect(user=os.environ['DB_USER'],
password=os.environ['DB_PASSWORD'],
dbname=os.environ['DB_NAME'],
host=os.environ['DB_HOST']) as conn:

with conn.cursor() as cur:
cur.execute("SELECT * FROM data")
print(cur.fetchall())

self.next(self.end)

@step
def end(self):
pass

if __name__ == '__main__':
DBFlow()

During development, you can run the flow locally, maybe reading credentials to a local database from environment variables - no need to use a secrets manager during early prototyping.

To read data from a test database, you can fetch credentials from a secrets manager by running the flow like this:

python dbflow.py
–with 'secrets:sources=[“test-db-credentials”]'
run

And you can deploy to production using a production database like this:

python dbflow.py
–with 'secrets:sources=["prod-db-credentials”]'
argo-workflows create

Advanced topics

The following topics come up occasionally when running flows in serious production environments.

Controlling access to secrets

A major benefit of using a secrets manager is that you can control closely who gets to access which secrets. In the case of AWS Secrets Manager, access control is accomplished through IAM policies. For more details, consult the section about access control in the AWS Secrets Manager documentation.

For instance, you can set up IAM policies so that only a test database is accessible to users directly, while production database can be only accessed by tasks running on a production scheduler.

Using an alternative role

By default, @secrets accesses secrets using the default IAM role available in the execution environment. For local runs, this is typically the role attached to the IAM user.

If the default role doesn't have access to the specified secrets, you can define an alternative role through the role attribute:

@secrets(sources=['metaflow-example-password'],
role='arn:aws:iam::123456789012:role/SecretsAccess')

The default role needs to be able to assume the specified role for this to work.

Accessing secrets from a non-default location

AWS Secrets Manager is an account- and region-specific service. By default, when you specify a secret name in the sources list, @secrets assumes that the name is available in the current AWS account, in the current default region.

If this is not the case, you can specify a full secrets ARN (available on the AWS Secrets Manager console) as a source:

@secrets(sources=['arn:aws:secretsmanager:us-west-2:001234556000:secret:some-secret'])