Skip to main content

Custom Config Parsers

The primary purpose of the parser argument in Config is to enable support for different configuration formats, as explained in the previous section. As parsers are arbitrary Python functions, one can leverage them more creatively as well, including generating configurations on the fly.

For instance, a custom parser could contact an external service to fetch a suitable config. Or, it could examine data and other context to determine a suitable configuration dynamically.

Generating configs programmatically

To illustrate config generation, consider this example that annotates each deployment with information about the current git branch through a custom parser function, git_info. It generates a config on the fly by calling the git command.

This example should be run in an existing Git repository. Check out the example from its repository for testing:

from subprocess import check_output
from metaflow import FlowSpec, step, Config, current, card
from metaflow.cards import Markdown, Table

def git_info(args):
info = {
"commit": ["git", "rev-parse", "HEAD"],
"branch": ["git", "rev-parse", "--abbrev-ref", "HEAD"],
"message": ["git", "log", "-1", "--pretty=%B"],
}
cfg = {}
for key, cmd in info.items():
cfg[key] = check_output(cmd, text=True).strip()
return cfg

class GitInfoFlow(FlowSpec):
git_info = Config("git_info", default_value="", parser=git_info)

def output_git_info(self):
table = [
[Markdown(f"**{k.capitalize()}**"), v] for k, v in self.git_info.items()
]
current.card["git"].append(Markdown(f"# Deployment info"))
current.card["git"].append(Table(table))

@card(type="blank", id="git")
@step
def start(self):
print("Git info", self.git_info)
self.output_git_info()
self.next(self.end)

@step
def end(self):
pass

if __name__ == "__main__":
GitInfoFlow()

As the config is generated on the fly, we don't need any config files to be present. We set Config(default_value='') to inform Config that we can proceed without a config file.

You can run the flow as usual:

python git_info.py run

and view the resulting card:

python git_config.py card view start

Including default configs in flows

The example above is practically useful, as it ensures that every deployment and local run includes lineage information about the originating Git branch. Thanks to Config, this information is stored in an artifact, making it easy to inspect later.

We may want to ensure that all flows include this information. We could include a Config line with a git_info parser in every flow, but it adds unnecessary boilerplate which developers shouldn't have to worry about.

Instead, we can create a FlowSpec template or superclass which takes care of this:

from subprocess import check_output

from metaflow import FlowSpec, Config, current
from metaflow.cards import Markdown, Table

def git_info(args):
info = {
"commit": ["git", "rev-parse", "HEAD"],
"branch": ["git", "rev-parse", "--abbrev-ref", "HEAD"],
"message": ["git", "log", "-1", "--pretty=%B"],
}
cfg = {}
for key, cmd in info.items():
cfg[key] = check_output(cmd, text=True).strip()
return cfg

class TrackedFlowSpec(FlowSpec):
git_info = Config("git_info", default_value="", parser=git_info)

def output_git_info(self):
print("deployment info", self.git_info)
if hasattr(current, "card"):
table = [
[Markdown(f"**{k.capitalize()}**"), v] for k, v in self.git_info.items()
]
current.card.append(Markdown(f"# Deployment info"))
current.card.append(Table(table))

We can distribute this module, tracked_flow.py, containing TrackedFlowSpec as a shared base class for flows to inherit from, such as MyTrackedFlow:

from metaflow import step, Config
from tracked_flow import TrackedFlowSpec

class MyTrackedFlow(TrackedFlowSpec):
config = Config("config", default="myconfig.json")

@step
def start(self):
self.output_git_info()
self.next(self.end)

@step
def end(self):
print("config", self.config)

if __name__ == "__main__":
MyTrackedFlow()

An important detail here is that MyTrackedFlow derives from TrackedFlowSpec, instead of the usual Metaflow FlowSpec. Since TrackedFlowSpec includes git_info, all derived flows will have it too, in addition to any Configs they define by themselves such as config above.

When you run MyTrackedFlow, it will record its git lineage in git_info automatically without boilerplate. Optionally, if you run the flow --with card, output_git_info will populate a card with the lineage information.