Skip to main content

Client API - Accessing past results

Use these objects to access data from past runs and to manipulate tags. Objects in this module are organized as a hierarchy:

Object hierarchy

Instantiating Objects

You can instantiate a specific object at any level of the hierarchy by providing a corresponding pathspec, e.g. from Metaflow logs.

  • Metaflow()
  • Flow('HelloFlow')
  • Run('HelloFlow/2')
  • Step('HelloFlow/2/start')
  • Task('HelloFlow/2/start/1')
  • DataArtifact('HelloFlow/2/start/1/name')

Listing objects

Each object is a container (an iterable) that can be used to iterate over objects that are below it in the hierarchy. For instance, iterating over a list(Flow(...)) yields a list of Runs, and list(Run(...)) yields a list of Steps.

Accessing children

Since each object is a container, you can access its children through the square-bracket notation, as if each object was a dictionary. For instance, you can access the object Task('HelloFlow/2/start/1') as follows:

Flow('HelloFlow')['2']['start']['1']

You can also test if the object has a certain child:

if '2' in Flow('HelloFlow'):
print('Run found')

Common attributes

All objects at the Run level and below have the following attributes:

  • tags (set) - tags associated with the run this object belongs to (user and system tags).
  • user_tags (set) - user tags associated with the run this object belongs to.
  • system_tags (set) - system tags associated with the run this object belongs to.
  • created_at (datetime) - Date and time this object was created.
  • parent (Metaflow object) - Parent of this object (e.g. Run(...).parent is a Flow).
  • pathspec (string) - Pathspec of this object (e.g. HelloFlow/2 for a Run).
  • path_components (list) - Components of the pathspec.
  • origin_pathspec (string) - If the object was produced via resume, pathspec of the original object this object was cloned from.

Object visibility

Note that only objects in the current namespace can be instantiated. See Namespace functions to see how to switch between namespaces.

This module accesses all objects through the current metadata provider - either Metaflow Service or local metadata. See Metadata functions for utilities related to metadata provider.

Object Hierarchy

Metaflow

Metaflow()

[source]

Entry point to all objects in the Metaflow universe.

This object can be used to list all the flows present either through the explicit property or by iterating over this object.

Attributes 

flows: List[Flow]

Returns the list of all Flow objects known to this metadata provider. Note that only flows present in the current namespace will be returned. A Flow is present in a namespace if it has at least one run in the namespace.

Flow

Flow(pathspec)

[source]

A Flow represents all existing flows with a certain name, in other words, classes derived from FlowSpec. A container of Run objects.

Attributes 

latest_run: Run

Latest Run (in progress or completed, successfully or not) of this flow.

latest_successful_run: Run

Latest successfully completed Run of this flow.

Flow.runs(self, *tags: str)

[source]

Returns an iterator over all Runs of this flow.

An optional filter is available that allows you to filter on tags. If multiple tags are specified, only runs that have all the specified tags are returned.

Parameters 

tags: str

Tags to match.

Yields 

Run

Run objects in this flow.

Run

Run(pathspec)

[source]

A Run represents an execution of a Flow. It is a container of Steps.

Attributes 

data: MetaflowData

a shortcut to run['end'].task.data, i.e. data produced by this run.

successful: bool

True if the run completed successfully.

finished: bool

True if the run completed.

finished_at: datetime

Time this run finished.

code: MetaflowCode

Code package for this run (if present). See MetaflowCode.

trigger: MetaflowTrigger

Information about event(s) that triggered this run (if present). See MetaflowTrigger.

end_task: Task

Task for the end step (if it is present already).

Run.add_tag(self, tag)

[source]

Add a tag to this Run.

Note that if the tag is already a system tag, it is not added as a user tag, and no error is thrown.

Parameters 

tag: str

Tag to add.

Run.add_tags(self, tags)

[source]

Add one or more tags to this Run.

Note that if any tag is already a system tag, it is not added as a user tag and no error is thrown.

Parameters 

tags: Iterable[str]

Tags to add.

Run.remove_tag(self, tag)

[source]

Remove one tag from this Run.

Removing a system tag is an error. Removing a non-existent user tag is a no-op.

Parameters 

tag: str

Tag to remove.

Run.remove_tags(self, tags)

[source]

Remove one or more tags to this Run.

Removing a system tag will result in an error. Removing a non-existent user tag is a no-op.

Parameters 

tags: Iterable[str]

Tags to remove.

Run.replace_tag(self, tag_to_remove, tag_to_add)

[source]

Remove a tag and add a tag atomically. Removal is done first. The rules for Run.add_tag and Run.remove_tag also apply here.

Parameters 

tag_to_remove: str

Tag to remove.

tag_to_add: str

Tag to add.

Run.replace_tags(self, tags_to_remove, tags_to_add)

[source]

Remove and add tags atomically; the removal is done first. The rules for Run.add_tag and Run.remove_tag also apply here.

Parameters 

tags_to_remove: Iterable[str]

Tags to remove.

tags_to_add: Iterable[str]

Tags to add.

Step

Step(pathspec)

[source]

A Step represents a user-defined step, that is, a method annotated with the @step decorator.

It contains Task objects associated with the step, that is, all executions of the Step. The step may contain multiple Tasks in the case of a foreach step.

Attributes 

task: Task

The first Task object in this step. This is a shortcut for retrieving the only task contained in a non-foreach step.

finished_at: datetime

Time when the latest Task of this step finished. Note that in the case of foreaches, this time may change during execution of the step.

environment_info: Dict[str, Any]

Information about the execution environment.

Task

Task(pathspec, attempt=None)

[source]

A Task represents an execution of a Step.

It contains all DataArtifact objects produced by the task as well as metadata related to execution.

Note that the @retry decorator may cause multiple attempts of the task to be present. Usually you want the latest attempt, which is what instantiating a Task object returns by default. If you need to e.g. retrieve logs from a failed attempt, you can explicitly get information about a specific attempt by using the following syntax when creating a task:

Task('flow/run/step/task', attempt=<attempt>)

where attempt=0 corresponds to the first attempt etc.

Attributes 

metadata: List[Metadata]

List of all metadata events associated with the task.

metadata_dict: Dict[str, str]

A condensed version of metadata: A dictionary where keys are names of metadata events and values the latest corresponding event.

data: MetaflowData

Container of all data artifacts produced by this task. Note that this call downloads all data locally, so it can be slower than accessing artifacts individually. See MetaflowData for more information.

artifacts: MetaflowArtifacts

Container of DataArtifact objects produced by this task.

successful: bool

True if the task completed successfully.

finished: bool

True if the task completed.

exception: object

Exception raised by this task if there was one.

finished_at: datetime

Time this task finished.

runtime_name: str

Runtime this task was executed on.

stdout: str

Standard output for the task execution.

stderr: str

Standard error output for the task execution.

code: MetaflowCode

Code package for this task (if present). See MetaflowCode.

environment_info: Dict[str, str]

Information about the execution environment.

Task.loglines(self, stream, as_unicode, meta_dict)

[source]

Return an iterator over (utc_timestamp, logline) tuples.

Parameters 

stream: str

Either 'stdout' or 'stderr'.

as_unicode: bool, default: True

If as_unicode=False, each logline is returned as a byte object. Otherwise, it is returned as a (unicode) string.

Yields 

Tuple[datetime, str]

Tuple of timestamp, logline pairs.

DataArtifact

DataArtifact(pathspec)

[source]

A single data artifact and associated metadata. Note that this object does not contain other objects as it is the leaf object in the hierarchy.

Attributes 

data: object

The data contained in this artifact, that is, the object produced during execution of this run.

sha: string

A unique ID of this artifact.

finished_at: datetime

Corresponds roughly to the Task.finished_at time of the parent Task. An alias for DataArtifact.created_at.

Helper Objects

MetaflowData

MetaflowData()

[source]

Container of data artifacts produced by a Task. This object is instantiated through Task.data.

MetaflowData allows results to be retrieved by their name through a convenient dot notation:

Task(...).data.my_object

You can also test the existence of an object

if 'my_object' in Task(...).data:
    print('my_object found')

Note that this container relies on the local cache to load all data artifacts. If your Task contains a lot of data, a more efficient approach is to load artifacts individually like so

Task(...)['my_object'].data

MetaflowCode

MetaflowCode()

[source]

Snapshot of the code used to execute this Run. Instantiate the object through Run(...).code (if any step is executed remotely) or Task(...).code for an individual task. The code package is the same for all steps of a Run.

MetaflowCode includes a package of the user-defined FlowSpec class and supporting files, as well as a snapshot of the Metaflow library itself.

Currently, MetaflowCode objects are stored only for Runs that have at least one Step executing outside the user's local environment.

The TarFile for the Run is given by Run(...).code.tarball

Attributes 

path: str

Location (in the datastore provider) of the code package.

info: Dict[str, str]

Dictionary of information related to this code-package.

flowspec: str

Source code of the file containing the FlowSpec in this code package.

tarball: TarFile

Python standard library tarfile.TarFile archive containing all the code.

MetaflowTrigger

MetaflowTrigger is returned by Run.trigger if the Run was triggered by an event. It is also returned by current.trigger when called from an event-triggered flow.

Trigger.event()

[source]

The MetaflowEvent object corresponding to the triggering event.

If multiple events triggered the run, this property is the latest event.

Returns 

MetaflowEvent, optional

The latest event that triggered the run, if applicable.

Trigger.events()

[source]

The list of MetaflowEvent objects correspondings to all the triggering events.

Returns 

List[MetaflowEvent], optional

List of all events that triggered the run

Trigger.run()

[source]

The corresponding Run object if the triggering event is a Metaflow run.

In case multiple runs triggered the run, this property is the latest run. Returns None if none of the triggering events are a Run.

Returns 

Run, optional

Latest Run that triggered this run, if applicable.

Trigger.runs()

[source]

The list of Run objects in the triggering events. Returns None if none of the triggering events are Run objects.

Returns 

List[Run], optional

List of runs that triggered this run, if applicable.

Trigger.__getitem__()

[source]

If triggering events are runs, key corresponds to the flow name of the triggering run. Otherwise, key corresponds to the event name and a MetaflowEvent object is returned.

Returns 

Union[Run, MetaflowEvent]

Run object if triggered by a run. Otherwise returns a MetaflowEvent.

MetaflowEvent

MetaflowEvent is returned by MetaflowTrigger (see above) for event-triggered runs.

MetaflowEvent()

[source]

Container of metadata that identifies the event that triggered the Run under consideration.

Attributes 

name: str

name of the event.

id: str

unique identifier for the event.

timestamp: datetime

timestamp recording creation time for the event.

type: str

type for the event - one of event or run

Namespace functions

namespace(ns)

[source]

from metaflow import namespace

Switch namespace to the one provided.

This call has a global effect. No objects outside this namespace will be accessible. To access all objects regardless of namespaces, pass None to this call.

Parameters 

ns: str, optional

Namespace to switch to or None to ignore namespaces.

Returns 

str, optional

Namespace set (result of get_namespace()).

get_namespace()

[source]

Return the current namespace that is currently being used to filter objects.

The namespace is a tag associated with all objects in Metaflow.

Returns 

str, optional

The current namespace used to filter objects.

default_namespace()

[source]

Resets the namespace used to filter objects to the default one, i.e. the one that was used prior to any namespace calls.

Returns 

str

The result of get_namespace() after the namespace has been reset.

Metadata functions

metadata(ms)

[source]

Switch Metadata provider.

This call has a global effect. Selecting the local metadata will, for example, not allow access to information stored in remote metadata providers.

Note that you don't typically have to call this function directly. Usually the metadata provider is set through the Metaflow configuration file. If you need to switch between multiple providers, you can use the METAFLOW_PROFILE environment variable to switch between configurations.

Parameters 

ms: str

Can be a path (selects local metadata), a URL starting with http (selects the service metadata) or an explicit specification <metadata_type>@<info>; as an example, you can specify local@<path> or service@<url>.

Returns 

str

The description of the metadata selected (equivalent to the result of get_metadata()).

get_metadata()

[source]

Returns the current Metadata provider.

If this is not set explicitly using metadata, the default value is determined through the Metaflow configuration. You can use this call to check that your configuration is set up properly.

If multiple configuration profiles are present, this call returns the one selected through the METAFLOW_PROFILE environment variable.

Returns 

str

Information about the Metadata provider currently selected. This information typically returns provider specific information (like URL for remote providers or local paths for local providers).

default_metadata()

[source]

Resets the Metadata provider to the default value, that is, to the value that was used prior to any metadata calls.

Returns 

str

The result of get_metadata() after resetting the provider.