Client API - Accessing past results
Use these objects to access data from past runs and to manipulate tags. Objects in this module are organized as a hierarchy:
Instantiating Objects
You can instantiate a specific object at any level of the hierarchy by providing a corresponding pathspec
, e.g. from Metaflow logs.
Metaflow()
Flow('HelloFlow')
Run('HelloFlow/2')
Step('HelloFlow/2/start')
Task('HelloFlow/2/start/1')
DataArtifact('HelloFlow/2/start/1/name')
Listing objects
Each object is a container (an iterable) that can be used to iterate over objects that are below it in the hierarchy. For instance, iterating over a list(Flow(...))
yields a list of Run
s, and list(Run(...))
yields a list of Step
s.
Accessing children
Since each object is a container, you can access its children through the square-bracket notation, as if each object was a dictionary. For instance, you can access the object Task('HelloFlow/2/start/1')
as follows:
Flow('HelloFlow')['2']['start']['1']
You can also test if the object has a certain child:
if '2' in Flow('HelloFlow'):
print('Run found')
Common attributes
All objects at the Run
level and below have the following attributes:
tags
(set) - tags associated with the run this object belongs to (user and system tags).user_tags
(set) - user tags associated with the run this object belongs to.system_tags
(set) - system tags associated with the run this object belongs to.created_at
(datetime) - Date and time this object was created.parent
(Metaflow object) - Parent of this object (e.g.Run(...).parent
is aFlow
).pathspec
(string) - Pathspec of this object (e.g.HelloFlow/2
for aRun
).path_components
(list) - Components of the pathspec.origin_pathspec
(string) - If the object was produced via resume, pathspec of the original object this object was cloned from.
Object visibility
Note that only objects in the current namespace can be instantiated. See Namespace functions to see how to switch between namespaces.
This module accesses all objects through the current metadata provider - either Metaflow Service or local metadata. See Metadata functions for utilities related to metadata provider.
Object Hierarchy
Metaflow
Entry point to all objects in the Metaflow universe.
This object can be used to list all the flows present either through the explicit property or by iterating over this object.
flows: List[Flow]
Returns the list of all Flow
objects known to this metadata provider. Note that only
flows present in the current namespace will be returned. A Flow
is present in a namespace
if it has at least one run in the namespace.
Flow
A Flow represents all existing flows with a certain name, in other words,
classes derived from FlowSpec
. A container of Run
objects.
latest_run: Run
Latest Run
(in progress or completed, successfully or not) of this flow.
latest_successful_run: Run
Latest successfully completed Run
of this flow.
Returns an iterator over all Run
s of this flow.
An optional filter is available that allows you to filter on tags. If multiple tags are specified, only runs that have all the specified tags are returned.
tags: str
Tags to match.
Run
Run
objects in this flow.
Run
A Run
represents an execution of a Flow
. It is a container of Step
s.
data: MetaflowData
a shortcut to run['end'].task.data, i.e. data produced by this run.
successful: bool
True if the run completed successfully.
finished: bool
True if the run completed.
finished_at: datetime
Time this run finished.
code: MetaflowCode
Code package for this run (if present). See MetaflowCode
.
trigger: MetaflowTrigger
Information about event(s) that triggered this run (if present). See MetaflowTrigger
.
end_task: Task
Task
for the end step (if it is present already).
Add a tag to this Run
.
Note that if the tag is already a system tag, it is not added as a user tag, and no error is thrown.
tag: str
Tag to add.
Add one or more tags to this Run
.
Note that if any tag is already a system tag, it is not added as a user tag and no error is thrown.
tags: Iterable[str]
Tags to add.
Remove one tag from this Run
.
Removing a system tag is an error. Removing a non-existent user tag is a no-op.
tag: str
Tag to remove.
Remove one or more tags to this Run
.
Removing a system tag will result in an error. Removing a non-existent user tag is a no-op.
tags: Iterable[str]
Tags to remove.
Remove a tag and add a tag atomically. Removal is done first.
The rules for Run.add_tag
and Run.remove_tag
also apply here.
tag_to_remove: str
Tag to remove.
tag_to_add: str
Tag to add.
Remove and add tags atomically; the removal is done first.
The rules for Run.add_tag
and Run.remove_tag
also apply here.
tags_to_remove: Iterable[str]
Tags to remove.
tags_to_add: Iterable[str]
Tags to add.
Step
A Step
represents a user-defined step, that is, a method annotated with the @step
decorator.
It contains Task
objects associated with the step, that is, all executions of the
Step
. The step may contain multiple Task
s in the case of a foreach step.
task: Task
The first Task
object in this step. This is a shortcut for retrieving the only
task contained in a non-foreach step.
finished_at: datetime
Time when the latest Task
of this step finished. Note that in the case of foreaches,
this time may change during execution of the step.
environment_info: Dict[str, Any]
Information about the execution environment.
Task
A Task
represents an execution of a Step
.
It contains all DataArtifact
objects produced by the task as
well as metadata related to execution.
Note that the @retry
decorator may cause multiple attempts of
the task to be present. Usually you want the latest attempt, which
is what instantiating a Task
object returns by default. If
you need to e.g. retrieve logs from a failed attempt, you can
explicitly get information about a specific attempt by using the
following syntax when creating a task:
Task('flow/run/step/task', attempt=<attempt>)
where attempt=0
corresponds to the first attempt etc.
metadata: List[Metadata]
List of all metadata events associated with the task.
metadata_dict: Dict[str, str]
A condensed version of metadata
: A dictionary where keys
are names of metadata events and values the latest corresponding event.
data: MetaflowData
Container of all data artifacts produced by this task. Note that this
call downloads all data locally, so it can be slower than accessing
artifacts individually. See MetaflowData
for more information.
artifacts: MetaflowArtifacts
Container of DataArtifact
objects produced by this task.
successful: bool
True if the task completed successfully.
finished: bool
True if the task completed.
exception: object
Exception raised by this task if there was one.
finished_at: datetime
Time this task finished.
runtime_name: str
Runtime this task was executed on.
stdout: str
Standard output for the task execution.
stderr: str
Standard error output for the task execution.
code: MetaflowCode
Code package for this task (if present). See MetaflowCode
.
environment_info: Dict[str, str]
Information about the execution environment.
Return an iterator over (utc_timestamp, logline) tuples.
stream: str
Either 'stdout' or 'stderr'.
as_unicode: bool, default: True
If as_unicode=False, each logline is returned as a byte object. Otherwise, it is returned as a (unicode) string.
Tuple[datetime, str]
Tuple of timestamp, logline pairs.
DataArtifact
A single data artifact and associated metadata. Note that this object does not contain other objects as it is the leaf object in the hierarchy.
data: object
The data contained in this artifact, that is, the object produced during execution of this run.
sha: string
A unique ID of this artifact.
finished_at: datetime
Corresponds roughly to the Task.finished_at
time of the parent Task
.
An alias for DataArtifact.created_at
.
Helper Objects
MetaflowData
Container of data artifacts produced by a Task
. This object is
instantiated through Task.data
.
MetaflowData
allows results to be retrieved by their name
through a convenient dot notation:
Task(...).data.my_object
You can also test the existence of an object
if 'my_object' in Task(...).data:
print('my_object found')
Note that this container relies on the local cache to load all data
artifacts. If your Task
contains a lot of data, a more efficient
approach is to load artifacts individually like so
Task(...)['my_object'].data
MetaflowCode
Snapshot of the code used to execute this Run
. Instantiate the object through
Run(...).code
(if any step is executed remotely) or Task(...).code
for an
individual task. The code package is the same for all steps of a Run
.
MetaflowCode
includes a package of the user-defined FlowSpec
class and supporting
files, as well as a snapshot of the Metaflow library itself.
Currently, MetaflowCode
objects are stored only for Run
s that have at least one Step
executing outside the user's local environment.
The TarFile
for the Run
is given by Run(...).code.tarball
path: str
Location (in the datastore provider) of the code package.
info: Dict[str, str]
Dictionary of information related to this code-package.
flowspec: str
Source code of the file containing the FlowSpec
in this code package.
tarball: TarFile
Python standard library tarfile.TarFile
archive containing all the code.
MetaflowTrigger
MetaflowTrigger
is returned by Run.trigger
if the Run
was triggered by an event. It is also returned by current.trigger
when called from an event-triggered flow.
The MetaflowEvent
object corresponding to the triggering event.
If multiple events triggered the run, this property is the latest event.
MetaflowEvent, optional
The latest event that triggered the run, if applicable.
The list of MetaflowEvent
objects correspondings to all the triggering events.
List[MetaflowEvent], optional
List of all events that triggered the run
The corresponding Run
object if the triggering event is a Metaflow run.
In case multiple runs triggered the run, this property is the latest run.
Returns None
if none of the triggering events are a Run
.
Run, optional
Latest Run that triggered this run, if applicable.
The list of Run
objects in the triggering events.
Returns None
if none of the triggering events are Run
objects.
List[Run], optional
List of runs that triggered this run, if applicable.
If triggering events are runs, key
corresponds to the flow name of the triggering run.
Otherwise, key
corresponds to the event name and a MetaflowEvent
object is returned.
Union[Run, MetaflowEvent]
Run
object if triggered by a run. Otherwise returns a MetaflowEvent
.
MetaflowEvent
MetaflowEvent
is returned by MetaflowTrigger
(see above) for event-triggered runs.
Container of metadata that identifies the event that triggered
the Run
under consideration.
name: str
name of the event.
id: str
unique identifier for the event.
timestamp: datetime
timestamp recording creation time for the event.
type: str
type for the event - one of event
or run
Namespace functions
from metaflow import namespace
Switch namespace to the one provided.
This call has a global effect. No objects outside this namespace will be accessible. To access all objects regardless of namespaces, pass None to this call.
ns: str, optional
Namespace to switch to or None to ignore namespaces.
str, optional
Namespace set (result of get_namespace()).
Return the current namespace that is currently being used to filter objects.
The namespace is a tag associated with all objects in Metaflow.
str, optional
The current namespace used to filter objects.
Resets the namespace used to filter objects to the default one, i.e. the one that was
used prior to any namespace
calls.
str
The result of get_namespace() after the namespace has been reset.
Metadata functions
Switch Metadata provider.
This call has a global effect. Selecting the local metadata will, for example, not allow access to information stored in remote metadata providers.
Note that you don't typically have to call this function directly. Usually
the metadata provider is set through the Metaflow configuration file. If you
need to switch between multiple providers, you can use the METAFLOW_PROFILE
environment variable to switch between configurations.
ms: str
Can be a path (selects local metadata), a URL starting with http (selects the service metadata) or an explicit specification <metadata_type>@<info>; as an example, you can specify local@<path> or service@<url>.
str
The description of the metadata selected (equivalent to the result of get_metadata()).
Returns the current Metadata provider.
If this is not set explicitly using metadata
, the default value is
determined through the Metaflow configuration. You can use this call to
check that your configuration is set up properly.
If multiple configuration profiles are present, this call returns the one
selected through the METAFLOW_PROFILE
environment variable.
str
Information about the Metadata provider currently selected. This information typically returns provider specific information (like URL for remote providers or local paths for local providers).