Skip to main content

The Client API Reference

Use these objects to access data from past runs and to manipulate tags. Objects in this module are organized as a hierarchy:

Object hierarchy

Instantiating Objects

You can instantiate a specific object at any level of the hierarchy by providing a corresponding pathspec, e.g. from Metaflow logs.

  • Metaflow()
  • Flow('HelloFlow')
  • Run('HelloFlow/2')
  • Step('HelloFlow/2/start')
  • Task('HelloFlow/2/start/1')
  • DataArtifact('HelloFlow/2/start/1/name')

Listing objects

Each object is a container (an iterable) that can be used to iterate over objects that are below it in the hierarchy. For instance, iterating over a list(Flow(...)) yields a list of Runs, and list(Run(...)) yields a list of Steps.

Accessing children

Since each object is a container, you can access its children through the square-bracket notation, as if each object was a dictionary. For instance, you can access the object Task('HelloFlow/2/start/1') as follows:

Flow('HelloFlow')['2']['start']['1']

You can also test if the object has a certain child:

if '2' in Flow('HelloFlow'):
print('Run found')

Common attributes

All objects at the Run level and below have the following attributes:

  • tags (set) - tags associated with the run this object belongs to (user and system tags).
  • user_tags (set) - user tags associated with the run this object belongs to.
  • system_tags (set) - system tags associated with the run this object belongs to.
  • created_at (datetime) - Date and time this object was created.
  • parent (Metaflow object) - Parent of this object (e.g. Run(...).parent is a Flow).
  • pathspec (string) - Pathspec of this object (e.g. HelloFlow/2 for a Run).
  • path_components (list) - Components of the pathspec.
  • origin_pathspec (string) - If the object was produced via resume, pathspec of the original object this object was cloned from.

Object visibility

Note that only objects in the current namespace can be instantiated. See Namespace functions to see how to switch between namespaces.

This module accesses all objects through the current metadata provider - either Metaflow Service or local metadata. See Metadata functions for utilities related to metadata provider.

Object Hierarchy

Metaflow

Metaflow()

[source]

from metaflow import Metaflow

Entry point to all objects in the Metaflow universe.

This object can be used to list all the flows present either through the explicit property or by iterating over this object.

Attributes 

flows: List[Flow]

Returns the list of all Flow objects known to this metadata provider. Note that only flows present in the current namespace will be returned. A Flow is present in a namespace if it has at least one run in the namespace.

Flow

Flow(*args, **kwargs)

[source]

from metaflow import Flow

A Flow represents all existing flows with a certain name, in other words, classes derived from FlowSpec. A container of Run objects.

Attributes 

latest_run: Run

Latest Run (in progress or completed, successfully or not) of this flow.

latest_successful_run: Run

Latest successfully completed Run of this flow.

Flow.runs(self, *tags)

[source]

from metaflow import Flow.runs

Returns an iterator over all Runs of this flow.

An optional filter is available that allows you to filter on tags. If multiple tags are specified, only runs that have all the specified tags are returned.

Parameters 

tags: string

Tags to match.

Returns 

Iterator[Run]

Iterator over Run objects in this flow.

Run

Run(pathspec, attempt, _object, _parent, _namespace_check)

[source]

from metaflow import Run

A Run represents an execution of a Flow. It is a container of Steps.

Attributes 

data: MetaflowData

a shortcut to run['end'].task.data, i.e. data produced by this run.

successful: boolean

True if the run completed successfully.

finished: boolean

True if the run completed.

finished_at: datetime

Time this run finished.

code: MetaflowCode

Code package for this run (if present). See MetaflowCode.

end_task: Task

Task for the end step (if it is present already).

Run.steps(self, *tags)

[source]

from metaflow import Run.steps

Returns an iterator over all Steps in the run.

An optional filter is available that allows you to filter on tags. If multiple tags are specified, only steps that have all the specified tags are returned.

Parameters 

tags: string

Tags to match.

Returns 

Iterator[Step]

Iterator over Step objects in this run.

Run.add_tag(self, tag)

[source]

from metaflow import Run.add_tag

Add a tag to this Run.

Note that if the tag is already a system tag, it is not added as a user tag, and no error is thrown.

Parameters 

tag: string

Tag to add.

Run.add_tags(self, tags)

[source]

from metaflow import Run.add_tags

Add one or more tags to this Run.

Note that if any tag is already a system tag, it is not added as a user tag and no error is thrown.

Parameters 

tags: Iterable[string]

Tags to add.

Run.remove_tag(self, tag)

[source]

from metaflow import Run.remove_tag

Remove one tag from this Run.

Removing a system tag is an error. Removing a non-existent user tag is a no-op.

Parameters 

tag: string

Tag to remove.

Run.remove_tags(self, tags)

[source]

from metaflow import Run.remove_tags

Remove one or more tags to this Run.

Removing a system tag will result in an error. Removing a non-existent user tag is a no-op.

Parameters 

tags: Iterable[string]

Tags to remove.

Run.replace_tag(self, tag_to_remove, tag_to_add)

[source]

from metaflow import Run.replace_tag

Remove a tag and add a tag atomically. Removal is done first. The rules for Run.add_tag and Run.remove_tag also apply here.

Parameters 

tag_to_remove: string

Tag to remove.

tag_to_add: string

Tag to add.

Run.replace_tags(self, tags_to_remove, tags_to_add)

[source]

from metaflow import Run.replace_tags

Remove and add tags atomically; the removal is done first. The rules for Run.add_tag and Run.remove_tag also apply here.

Parameters 

tags_to_remove: Iterable[string]

Tags to remove.

tags_to_add: Iterable[string]

Tags to add.

Step

Step(pathspec, attempt, _object, _parent, _namespace_check)

[source]

from metaflow import Step

A Step represents a user-defined step, that is, a method annotated with the @step decorator.

It contains Task objects associated with the step, that is, all executions of the Step. The step may contain multiple Tasks in the case of a foreach step.

Attributes 

task: Task

The first Task object in this step. This is a shortcut for retrieving the only task contained in a non-foreach step.

finished_at: datetime

Time when the latest Task of this step finished. Note that in the case of foreaches, this time may change during execution of the step.

environment_info: Dict

Information about the execution environment.

Task

Task(*args, **kwargs)

[source]

from metaflow import Task

A Task represents an execution of a Step.

It contains all DataArtifact objects produced by the task as well as metadata related to execution.

Note that the @retry decorator may cause multiple attempts of the task to be present. Usually you want the latest attempt, which is what instantiating a Task object returns by default. If you need to e.g. retrieve logs from a failed attempt, you can explicitly get information about a specific attempt by using the following syntax when creating a task:

Task('flow/run/step/task', attempt=<attempt>)

where attempt=0 corresponds to the first attempt etc.

Attributes 

metadata: List[Metadata]

List of all metadata events associated with the task.

metadata_dict: Dict

A condensed version of metadata: A dictionary where keys are names of metadata events and values the latest corresponding event.

data: MetaflowData

Container of all data artifacts produced by this task. Note that this call downloads all data locally, so it can be slower than accessing artifacts individually. See MetaflowData for more information.

artifacts: MetaflowArtifacts

Container of DataArtifact objects produced by this task.

successful: boolean

True if the task successfully completed.

finished: boolean

True if the task completed.

exception: object

Exception raised by this task if there was one.

finished_at: datetime

Time this task finished.

runtime_name: string

Runtime this task was executed on.

stdout: string

Standard output for the task execution.

stderr: string

Standard error output for the task execution.

code: MetaflowCode

Code package for this task (if present). See MetaflowCode.

environment_info: Dict

Information about the execution environment.

Task.loglines(self, stream, as_unicode, meta_dict)

[source]

from metaflow import Task.loglines

Return an iterator over (utc_timestamp, logline) tuples.

Parameters 

stream: string

Either 'stdout' or 'stderr'.

as_unicode: boolean

If as_unicode=False, each logline is returned as a byte object. Otherwise, it is returned as a (unicode) string.

Returns 

Iterator[(datetime, string)]

Iterator over timestamp, logline pairs.

DataArtifact

DataArtifact(pathspec, attempt, _object, _parent, _namespace_check)

[source]

from metaflow import DataArtifact

A single data artifact and associated metadata. Note that this object does not contain other objects as it is the leaf object in the hierarchy.

Attributes 

data: object

The data contained in this artifact, that is, the object produced during execution of this run.

sha: string

A unique ID of this artifact.

finished_at: datetime

Corresponds roughly to the Task.finished_at time of the parent Task. An alias for DataArtifact.created_at.

Helper Objects

MetaflowData

MetaflowData(artifacts)

[source]

from metaflow import MetaflowData

Container of data artifacts produced by a Task. This object is instantiated through Task.data.

MetaflowData allows results to be retrieved by their name through a convenient dot notation:

Task(...).data.my_object

You can also test the existence of an object

if 'my_object' in Task(...).data:
    print('my_object found')

Note that this container relies on the local cache to load all data artifacts. If your Task contains a lot of data, a more efficient approach is to load artifacts individually like so

Task(...)['my_object'].data

MetaflowCode

MetaflowCode(flow_name, code_package)

[source]

from metaflow import MetaflowCode

Snapshot of the code used to execute this Run. Instantiate the object through Run(...).code (if all steps are executed remotely) or Task(...).code for an individual task. The code package is the same for all steps of a Run.

MetaflowCode includes a package of the user-defined FlowSpec class and supporting files, as well as a snapshot of the Metaflow library itself.

Currently MetaflowCode objects are stored only for Runs that have at least one Step executing outside the user's local environment.

You can extract code in the directory snapshot like so:

Run(...).code.extractall(path='snapshot')
Attributes 

path: string

Location (in the datastore provider) of the code package.

info: Dict

Dictionary of information related to this code-package.

flowspec: string

Source code of the file containing the FlowSpec in this code package.

tarball: TarFile

Python standard library tarfile.TarFile archive containing all the code.

Namespace functions

namespace(ns)

[source]

from metaflow import namespace

Switch namespace to the one provided.

This call has a global effect. No objects outside this namespace will be accessible. To access all objects regardless of namespaces, pass None to this call.

Parameters 

ns: string

Namespace to switch to or None to ignore namespaces.

Returns 

string

Namespace set (result of get_namespace()).

get_namespace()

[source]

from metaflow import get_namespace

Return the current namespace that is currently being used to filter objects.

The namespace is a tag associated with all objects in Metaflow.

Returns 

string or None

The current namespace used to filter objects.

default_namespace()

[source]

from metaflow import default_namespace

Resets the namespace used to filter objects to the default one, i.e. the one that was used prior to any namespace calls.

Returns 

string

The result of get_namespace() after the namespace has been reset.

Metadata functions

metadata(ms)

[source]

from metaflow import metadata

Switch Metadata provider.

This call has a global effect. Selecting the local metadata will, for example, not allow access to information stored in remote metadata providers.

Note that you don't typically have to call this function directly. Usually the metadata provider is set through the Metaflow configuration file. If you need to switch between multiple providers, you can use the METAFLOW_PROFILE environment variable to switch between configurations.

Parameters 

ms: string

Can be a path (selects local metadata), a URL starting with http (selects the service metadata) or an explicit specification <metadata_type>@<info>; as an example, you can specify local@<path> or service@<url>.

Returns 

string

The description of the metadata selected (equivalent to the result of get_metadata()).

get_metadata()

[source]

from metaflow import get_metadata

Returns the current Metadata provider.

If this is not set explicitly using metadata, the default value is determined through the Metaflow configuration. You can use this call to check that your configuration is set up properly.

If multiple configuration profiles are present, this call returns the one selected through the METAFLOW_PROFILE environment variable.

Returns 

string

Information about the Metadata provider currently selected. This information typically returns provider specific information (like URL for remote providers or local paths for local providers).

default_metadata()

[source]

from metaflow import default_metadata

Resets the Metadata provider to the default value, that is, to the value that was used prior to any metadata calls.

Returns 

string

The result of get_metadata() after resetting the provider.