Client API - Accessing past results
Use these objects to access data from past runs and to manipulate tags. Objects in this module are organized as a hierarchy:
Instantiating Objects
You can instantiate a specific object at any level of the hierarchy by providing a corresponding pathspec
, e.g. from Metaflow logs.
Metaflow()
Flow('HelloFlow')
Run('HelloFlow/2')
Step('HelloFlow/2/start')
Task('HelloFlow/2/start/1')
DataArtifact('HelloFlow/2/start/1/name')
Listing objects
Each object is a container (an iterable) that can be used to iterate over objects that are below it in the hierarchy. For instance, iterating over a list(Flow(...))
yields a list of Run
s, and list(Run(...))
yields a list of Step
s.
Accessing children
Since each object is a container, you can access its children through the square-bracket notation, as if each object was a dictionary. For instance, you can access the object Task('HelloFlow/2/start/1')
as follows:
Flow('HelloFlow')['2']['start']['1']
You can also test if the object has a certain child:
if '2' in Flow('HelloFlow'):
print('Run found')
Common attributes
All objects at the Run
level and below have the following attributes:
tags
(set) - tags associated with the run this object belongs to (user and system tags).user_tags
(set) - user tags associated with the run this object belongs to.system_tags
(set) - system tags associated with the run this object belongs to.created_at
(datetime) - Date and time this object was created.parent
(Metaflow object) - Parent of this object (e.g.Run(...).parent
is aFlow
).pathspec
(string) - Pathspec of this object (e.g.HelloFlow/2
for aRun
).path_components
(list) - Components of the pathspec.origin_pathspec
(string) - If the object was produced via resume, pathspec of the original object this object was cloned from.
Object visibility
Note that only objects in the current namespace can be instantiated. See Namespace functions to see how to switch between namespaces.
This module accesses all objects through the current metadata provider - either Metaflow Service or local metadata. See Metadata functions for utilities related to metadata provider.
Object Hierarchy
Metaflow
Entry point to all objects in the Metaflow universe.
This object can be used to list all the flows present either through the explicit property or by iterating over this object.
flows: List[Flow]
Returns the list of all Flow
objects known to this metadata provider. Note that only
flows present in the current namespace will be returned. A Flow
is present in a namespace
if it has at least one run in the namespace.
Flow
A Flow represents all existing flows with a certain name, in other words,
classes derived from FlowSpec
. A container of Run
objects.
latest_run: Run
Latest Run
(in progress or completed, successfully or not) of this flow.
latest_successful_run: Run
Latest successfully completed Run
of this flow.
Returns an iterator over all Run
s of this flow.
An optional filter is available that allows you to filter on tags. If multiple tags are specified, only runs that have all the specified tags are returned.
tags: string
Tags to match.
Iterator[Run]
Iterator over Run
objects in this flow.
Run
A Run
represents an execution of a Flow
. It is a container of Step
s.
data: MetaflowData
a shortcut to run['end'].task.data, i.e. data produced by this run.
successful: boolean
True if the run completed successfully.
finished: boolean
True if the run completed.
finished_at: datetime
Time this run finished.
code: MetaflowCode
Code package for this run (if present). See MetaflowCode
.
end_task: Task
Task
for the end step (if it is present already).
Add a tag to this Run
.
Note that if the tag is already a system tag, it is not added as a user tag, and no error is thrown.
tag: string
Tag to add.
Add one or more tags to this Run
.
Note that if any tag is already a system tag, it is not added as a user tag and no error is thrown.
tags: Iterable[string]
Tags to add.
Remove one tag from this Run
.
Removing a system tag is an error. Removing a non-existent user tag is a no-op.
tag: string
Tag to remove.
Remove one or more tags to this Run
.
Removing a system tag will result in an error. Removing a non-existent user tag is a no-op.
tags: Iterable[string]
Tags to remove.
Remove a tag and add a tag atomically. Removal is done first.
The rules for Run.add_tag
and Run.remove_tag
also apply here.
tag_to_remove: string
Tag to remove.
tag_to_add: string
Tag to add.
Remove and add tags atomically; the removal is done first.
The rules for Run.add_tag
and Run.remove_tag
also apply here.
tags_to_remove: Iterable[string]
Tags to remove.
tags_to_add: Iterable[string]
Tags to add.
Step
A Step
represents a user-defined step, that is, a method annotated with the @step
decorator.
It contains Task
objects associated with the step, that is, all executions of the
Step
. The step may contain multiple Task
s in the case of a foreach step.
task: Task
The first Task
object in this step. This is a shortcut for retrieving the only
task contained in a non-foreach step.
finished_at: datetime
Time when the latest Task
of this step finished. Note that in the case of foreaches,
this time may change during execution of the step.
environment_info: Dict
Information about the execution environment.
Task
A Task
represents an execution of a Step
.
It contains all DataArtifact
objects produced by the task as
well as metadata related to execution.
Note that the @retry
decorator may cause multiple attempts of
the task to be present. Usually you want the latest attempt, which
is what instantiating a Task
object returns by default. If
you need to e.g. retrieve logs from a failed attempt, you can
explicitly get information about a specific attempt by using the
following syntax when creating a task:
Task('flow/run/step/task', attempt=<attempt>)
where attempt=0
corresponds to the first attempt etc.
metadata: List[Metadata]
List of all metadata events associated with the task.
metadata_dict: Dict
A condensed version of metadata
: A dictionary where keys
are names of metadata events and values the latest corresponding event.
data: MetaflowData
Container of all data artifacts produced by this task. Note that this
call downloads all data locally, so it can be slower than accessing
artifacts individually. See MetaflowData
for more information.
artifacts: MetaflowArtifacts
Container of DataArtifact
objects produced by this task.
successful: boolean
True if the task successfully completed.
finished: boolean
True if the task completed.
exception: object
Exception raised by this task if there was one.
finished_at: datetime
Time this task finished.
runtime_name: string
Runtime this task was executed on.
stdout: string
Standard output for the task execution.
stderr: string
Standard error output for the task execution.
code: MetaflowCode
Code package for this task (if present). See MetaflowCode
.
environment_info: Dict
Information about the execution environment.
Return an iterator over (utc_timestamp, logline) tuples.
stream: string
Either 'stdout' or 'stderr'.
as_unicode: boolean
If as_unicode=False, each logline is returned as a byte object. Otherwise, it is returned as a (unicode) string.
Iterator[(datetime, string)]
Iterator over timestamp, logline pairs.
DataArtifact
A single data artifact and associated metadata. Note that this object does not contain other objects as it is the leaf object in the hierarchy.
data: object
The data contained in this artifact, that is, the object produced during execution of this run.
sha: string
A unique ID of this artifact.
finished_at: datetime
Corresponds roughly to the Task.finished_at
time of the parent Task
.
An alias for DataArtifact.created_at
.
Helper Objects
MetaflowData
Container of data artifacts produced by a Task
. This object is
instantiated through Task.data
.
MetaflowData
allows results to be retrieved by their name
through a convenient dot notation:
Task(...).data.my_object
You can also test the existence of an object
if 'my_object' in Task(...).data:
print('my_object found')
Note that this container relies on the local cache to load all data
artifacts. If your Task
contains a lot of data, a more efficient
approach is to load artifacts individually like so
Task(...)['my_object'].data
MetaflowCode
Snapshot of the code used to execute this Run
. Instantiate the object through
Run(...).code
(if all steps are executed remotely) or Task(...).code
for an
individual task. The code package is the same for all steps of a Run
.
MetaflowCode
includes a package of the user-defined FlowSpec
class and supporting
files, as well as a snapshot of the Metaflow library itself.
Currently MetaflowCode
objects are stored only for Run
s that have at least one Step
executing outside the user's local environment.
You can extract code in the directory snapshot
like so:
Run(...).code.extractall(path='snapshot')
path: string
Location (in the datastore provider) of the code package.
info: Dict
Dictionary of information related to this code-package.
flowspec: string
Source code of the file containing the FlowSpec
in this code package.
tarball: TarFile
Python standard library tarfile.TarFile
archive containing all the code.
Namespace functions
from metaflow import namespace
Switch namespace to the one provided.
This call has a global effect. No objects outside this namespace will be accessible. To access all objects regardless of namespaces, pass None to this call.
ns: string
Namespace to switch to or None to ignore namespaces.
string
Namespace set (result of get_namespace()).
Return the current namespace that is currently being used to filter objects.
The namespace is a tag associated with all objects in Metaflow.
string or None
The current namespace used to filter objects.
Resets the namespace used to filter objects to the default one, i.e. the one that was
used prior to any namespace
calls.
string
The result of get_namespace() after the namespace has been reset.
Metadata functions
Switch Metadata provider.
This call has a global effect. Selecting the local metadata will, for example, not allow access to information stored in remote metadata providers.
Note that you don't typically have to call this function directly. Usually
the metadata provider is set through the Metaflow configuration file. If you
need to switch between multiple providers, you can use the METAFLOW_PROFILE
environment variable to switch between configurations.
ms: string
Can be a path (selects local metadata), a URL starting with http (selects the service metadata) or an explicit specification <metadata_type>@<info>; as an example, you can specify local@<path> or service@<url>.
string
The description of the metadata selected (equivalent to the result of get_metadata()).
Returns the current Metadata provider.
If this is not set explicitly using metadata
, the default value is
determined through the Metaflow configuration. You can use this call to
check that your configuration is set up properly.
If multiple configuration profiles are present, this call returns the one
selected through the METAFLOW_PROFILE
environment variable.
string
Information about the Metadata provider currently selected. This information typically returns provider specific information (like URL for remote providers or local paths for local providers).