Parsing Configs

All the basic configuration examples we discussed relied on a single JSON file for defining the configuration. This page covers scenarios involving alternative formats and more advanced configurations.

To support diverse config formats and use cases, Config allows you to specify a custom parser. All it has to do is to output a Python dictionary.

Using common formats like `TOML` and `YAML`

To use a format other than JSON, you need to define a parser that accepts a string (the configuration file) and converts it to a Python dictionary. Luckily, common configuration formats include such a function out of the box.

For instance, this example uses a configuration file specified in TOML, a pleasantly human-readable and writable format, a parser of which is included in Python since 3.11:

import pprint
from metaflow import FlowSpec, step, Config, resources

class TomlConfigFlow(FlowSpec):
    config = Config("config", default="myconfig.toml", parser="tomllib.loads")

    @resources(cpu=config.resources.cpu)
    @step
    def start(self):
        print("Config loaded:")
        pprint.pp(self.config)
        self.next(self.end)

    @step
    def end(self):
        pass


if __name__ == "__main__":
    TomlConfigFlow()

Note that the parser is defined as a string, so you can run the flow in remote systems that might not have the parser installed. There is no need to include it in @pypi or @conda or the underlying container images.

The corresponding TOML file, myconfig.toml looks like this:

[general]
a = 5

[resources]
cpu = 1

You can run the flow as usual, assuming you are using Python 3.11 or newer:

python toml_config.py run

tip

The parser needs to be available only on the system that starts a run or a deployment. Since Configs are evaluated only during the deploy-time, you don't need to have the parser available in all remote nodes.

To make sure remote systems won't complain about missing imports, you can define the parser function as a string, such as "tomllib.loads".

Loading YAML

To load YAML, change the Config line in the above example to

config = Config("config", default="myconfig.yml", parser="yaml.full_load")

You can test it with a YAML file like this:

resources:
  cpu: 1
foo: bar

Make sure that you have the pyyaml package installed before running the flow.

Validating configs with `pydantic`

Besides loading custom formats, you can use parsers to preprocess and validate configs (and much more). This example uses the popular pydantic package to validate the config schema:

import json, pprint
from metaflow import FlowSpec, step, Config, resources

def pydantic_parser(txt):
    from pydantic import BaseModel, PositiveInt
    from datetime import datetime

    class ConfigSchema(BaseModel):
        id: int
        signup_ts: datetime | None
        tastes: dict[str, PositiveInt]

    cfg = json.loads(txt)
    ConfigSchema.model_validate(cfg)
    return cfg

class PydanticConfigFlow(FlowSpec):
    config = Config("config", parser=pydantic_parser)

    @step
    def start(self):
        print("Config loaded and validated:")
        pprint.pp(self.config)
        self.next(self.end)

    @step
    def end(self):
        pass

if __name__ == "__main__":
    PydanticConfigFlow()

You can test validation with the following config that has a small data issue which a human can miss easily:

python pydantic_config.py --config-value config \
  '{"id": 1, "signup_ts": "2024-12-12 00:12", "tastes": {"milk": -2}}' \
  run

Pydantic spots the issue and gives a helpful error message about it before a run starts.

Advanced configurations with `OmegaConf`

Thanks to custom parsers, Metaflow Configs work well with existing configuration frameworks like OmegaConf which handle advanced cases involving multiple overlapping configuration files, value interpolation and substitution, and schema validation out of the box.

To illustrate what is possible, consider this example that uses OmegaConf to read a YAML Config, values of which you can selectively override for a run with a Parameter:

import pprint

from metaflow import FlowSpec, step, Config, Parameter
from omegaconf import OmegaConf

def omega_parse(txt):
    config = OmegaConf.create(txt)
    return OmegaConf.to_container(config, resolve=True)

def parse_overrides(config, overrides):
    cfg = config.to_dict()
    if overrides:
        base = OmegaConf.create(cfg)
        config = OmegaConf.from_dotlist(overrides.split(","))
        return OmegaConf.to_container(OmegaConf.merge(base, config))
    else:
        return cfg

class OmegaConfigFlow(FlowSpec):
    base = Config("config", default="omega.yaml", parser=omega_parse)
    overrides = Parameter("overrides", default="")

    @step
    def start(self):
        self.config = parse_overrides(self.base, self.overrides)
        pprint.pp(self.config)
        self.next(self.end)

    @step
    def end(self):
        pass

if __name__ == "__main__":
    OmegaConfigFlow()

The example reads a YAML config as usual, but it allows the user to override any specific keys in the config using OmegaConf's support for the dot-list format and configuration merging.

Test this with a YAML file like below, which for illustration leverages variable interpolation:

training:
  alpha: 0.5
  epochs: 100
id: training-${training.alpha}-${training.epochs}
dataset:
  name: myfile.txt
  preprocess: false

Run the flow like here, overriding two keys dataset.name and training.epochs in the config:

python omega_config.py run --overrides dataset.name=newfile.txt,training.epochs=50

Parsing Configs

Using common formats like TOML and YAML​

Loading YAML​

Validating configs with pydantic​

Advanced configurations with OmegaConf​

Using common formats like `TOML` and `YAML`

Loading YAML

Validating configs with `pydantic`

Advanced configurations with `OmegaConf`