startand a step called
end. An execution of the flow, which we call a run, starts at
start. The run is successful if the final
endstep finishes successfully.
endis up to you. You can construct the graph in between using an arbitrary combination of the following three types of transitions supported by Metaflow:
endin order, this flow creates a data artifact called
my_var. In Metaflow, data artifacts are created simply by assigning values to instance variables like
starttransitions to two parallel steps,
b. Any number of parallel steps are allowed. A benefit of a branch like this is performance: Metaflow can execute
bover multiple CPU cores or over multiple instances in the cloud.
joinas above but it must take an extra argument, like
xabove is ambiguous:
asets it to
2. To disambiguate the branches, the join step can refer to a specific step in the branch, like
inputs.a.xabove. For convenience, you can also iterate over all steps in the branch using
inputs, as done in the last print statement in the above
joinstep. For more details, see the section about data flow through the graph.
foreachargument takes a string that is the name of a list stored in an instance variable, like
ato process the three items of the
titleslist in parallel. You can access the specific item assigned to a task with an instance variable called
inputs. If you want, you can assign a value to an instance variable in a foreach step which helps you to identify the task.
Parameterobject to a class variable. Parameter variables are automatically available in all steps, like
--num_componentsto an integer value.
Parameterstook simple scalar values, such as integers or floating point values. To support more complex values for
Parameter, Metaflow allows you to specify the value as JSON. This feature comes in handy if your
Parameteris a list of values, a mapping, or a more complex data structure.
merge_artifacts, to aid in propagating unambiguous values.
merge_artifactsfunction behaves as follows:
pass_downis propagated because it is unmodified in both
commonis also propagated because it is set to the same value in both branches. Remember that it is the value of the artifact that matters when determining whether an artifact is ambiguous; Metaflow uses content based deduplication to store artifacts and can therefore determine if the value of two artifacts is the same.
xis handled by the code explicitly prior to the call to
xwhen propagating artifacts. This pattern allows you to manually resolve any ambiguity in artifacts you would like to see propagated.
yis not propagated because it is listed in the
excludelist. This pattern allows you to prevent the propagation of artifacts that are no longer relevant. Remember that the default behavior of
merge_artifactsis to propagate all incoming artifacts.
from_ais propagated because it is only set in one branch and therefore is unambiguous.
merge_artifactswill propagate all values even if they are present on only one incoming branch.
includekeyword is used and allows you to explicitly specify the artifacts to consider when merging. This is useful when the list of artifacts to exclude is larger than the one to include. You cannot use both an
excludelist in the same
merge_artifactscall. Note also that if an artifact is specified in
include, an error will be thrown if it either doesn't exist in the current step or doesn't exist on one of the inputs (in other words, it is "missing"). The
includeparameter is only available in version 2.2.1 or later.
merge_artifactsfunction will raise an exception if an artifact that it should merge has an ambiguous value. Remember that
merge_artifactswill attempt to merge all incoming artifacts except if they are already present in the step or have been explicitly excluded in the