Skip to main content

Parsing Configs

All the basic configuration examples we discussed relied on a single JSON file for defining the configuration. This page covers scenarios involving alternative formats and more advanced configurations.

To support diverse config formats and use cases, Config allows you to specify a custom parser. All it has to do is to output a Python dictionary.

Using common formats like TOML and YAML

To use a format other than JSON, you need to define a parser that accepts a string (the configuration file) and converts it to a Python dictionary. Luckily, common configuration formats include such a function out of the box.

For instance, this example uses a configuration file specified in TOML, a pleasantly human-readable and writable format, a parser of which is included in Python since 3.11:

import pprint
from metaflow import FlowSpec, step, Config, resources

class TomlConfigFlow(FlowSpec):
config = Config("config", default="myconfig.toml", parser="tomllib.loads")

@resources(cpu=config.resources.cpu)
@step
def start(self):
print("Config loaded:")
pprint.pp(self.config)
self.next(self.end)

@step
def end(self):
pass


if __name__ == "__main__":
TomlConfigFlow()

Note that the parser is defined as a string, so you can run the flow in remote systems that might not have the parser installed. There is no need to include it in @pypi or @conda or the underlying container images.

The corresponding TOML file, myconfig.toml looks like this:

[general]
a = 5

[resources]
cpu = 1

You can run the flow as usual, assuming you are using Python 3.11 or newer:

python toml_config.py run
tip

The parser needs to be available only on the system that starts a run or a deployment. Since Configs are evaluated only during the deploy-time, you don't need to have the parser available in all remote nodes.

To make sure remote systems won't complain about missing imports, you can define the parser function as a string, such as "tomllib.loads".

Loading YAML

To load YAML, change the Config line in the above example to

config = Config("config", default="myconfig.yml", parser="yaml.full_load")

You can test it with a YAML file like this:

resources:
cpu: 1
foo: bar

Make sure that you have the pyyaml package installed before running the flow.

Validating configs with pydantic

Besides loading custom formats, you can use parsers to preprocess and validate configs (and much more). This example uses the popular pydantic package to validate the config schema:

import json, pprint
from metaflow import FlowSpec, step, Config, resources

def pydantic_parser(txt):
from pydantic import BaseModel, PositiveInt
from datetime import datetime

class ConfigSchema(BaseModel):
id: int
signup_ts: datetime | None
tastes: dict[str, PositiveInt]

cfg = json.loads(txt)
ConfigSchema.model_validate(cfg)
return cfg

class PydanticConfigFlow(FlowSpec):
config = Config("config", parser=pydantic_parser)

@step
def start(self):
print("Config loaded and validated:")
pprint.pp(self.config)
self.next(self.end)

@step
def end(self):
pass

if __name__ == "__main__":
PydanticConfigFlow()

You can test validation with the following config that has a small data issue which a human can miss easily:

python pydantic_config.py --config-value config \
'{"id": 1, "signup_ts": "2024-12-12 00:12", "tastes": {"milk": -2}}' \
run

Pydantic spots the issue and gives a helpful error message about it before a run starts.

Advanced configurations with OmegaConf

Thanks to custom parsers, Metaflow Configs work well with existing configuration frameworks like OmegaConf which handle advanced cases involving multiple overlapping configuration files, value interpolation and substitution, and schema validation out of the box.

To illustrate what is possible, consider this example that uses OmegaConf to read a YAML Config, values of which you can selectively override for a run with a Parameter:

import pprint

from metaflow import FlowSpec, step, Config, Parameter
from omegaconf import OmegaConf

def omega_parse(txt):
config = OmegaConf.create(txt)
return OmegaConf.to_container(config, resolve=True)

def parse_overrides(config, overrides):
cfg = config.to_dict()
if overrides:
base = OmegaConf.create(cfg)
config = OmegaConf.from_dotlist(overrides.split(","))
return OmegaConf.to_container(OmegaConf.merge(base, config))
else:
return cfg

class OmegaConfigFlow(FlowSpec):
base = Config("config", default="omega.yaml", parser=omega_parse)
overrides = Parameter("overrides", default="")

@step
def start(self):
self.config = parse_overrides(self.base, self.overrides)
pprint.pp(self.config)
self.next(self.end)

@step
def end(self):
pass

if __name__ == "__main__":
OmegaConfigFlow()

The example reads a YAML config as usual, but it allows the user to override any specific keys in the config using OmegaConf's support for the dot-list format and configuration merging.

Test this with a YAML file like below, which for illustration leverages variable interpolation:

training:
alpha: 0.5
epochs: 100
id: training-${training.alpha}-${training.epochs}
dataset:
name: myfile.txt
preprocess: false

Run the flow like here, overriding two keys dataset.name and training.epochs in the config:

python omega_config.py run --overrides dataset.name=newfile.txt,training.epochs=50