FeatureFlow. As they, amongst other people, run their versions independently they end up with the following runs:
PredictionFlowresults in a notebook by remembering that her latest run is
PredictionFlow/8. Fortunately, Metaflow makes this even easier thanks to namespaces:
PredictionFlow, her runs are automatically tagged with her user name, prefixed with
user:. By default, when Anne uses the Client API in a notebook or in a Python script, the API only returns results that are tagged with
user:anne. Instead of having to remember the exact ID of her latest run, she can simply say:
'PredictionFlow/8'. For Will, this will return
PredictionFlow/5is in Will's namespace. An important feature of namespaces is to make sure that you can't accidentally use someone else's results, which could lead to hard to debug, incorrect analyses.
--namespaceflag on the command line to switch between namespaces. This is a better approach than hardcoding a
namespace()function call in the code that defines your Metaflow workflow.
namespace(None)allows you allows you to access all results without limitations. Be careful though: relative references like
latest_runmake little sense in the global namespace since anyone can produce a new run at any time.
latest_runcould break the production easily as the user keeps executing experimental runs.
--authorizeoption only once. Metaflow stores the token for them after the first deployment, so they need to do this only once.
step-functions createagain, it will deploy an updated version of your code in the existing production namespace of the flow.
resumecommand is smart enough to work across production and personal namespaces. You can
resumea production workflow without having to do anything special with namespaces.
user:tag is assigned by Metaflow automatically. In addition to automatically assigned tags, you can add and remove arbitrary tags in objects. Tags are an excellent way to add extra annotations to runs, tasks etc., which makes it easier for you and other people to find and retrieve results of interest.
--tagcommand line option. You can add multiple tags with multiple
--tagoptions. For instance, this will annotate a
HelloFlowrun with a tag
--tagoption assigns the specified tag to all objects produced by the run: the run itself, its steps, tasks, and data artifacts.
HelloFlowwith a tag
crazy_testin your namespace. Filtering is performed both based on the current
namespace()and the tag filter.
.tagsproperty. In the above case,
run.tagswould return a set with a string
crazy_testamongst other automatically assigned tags.
PredictionFlowbut they want to collaborate on
FeatureFlow. They could add a descriptive tag, say
FeatureFlowregardless of the user who ran the flow:
FeatureFlow/34which happened to be run by Anne. If Will runs the flow again, his results will be the latest results in this namespace.
currentthat represents the identity of the currently running task. Use it in your
FlowSpecto retrieve current IDs of interest:
current.pathspecis convenient as an unambiguous identifier of a task. For instance, the above script printed out
Taskobject as follows:
currentsingleton also provides programmatic access to the CLI option
--origin-run-idused by the resume within your flow code.
currentsingleton would reflect that value.
run(successful or not).
runinvocations, the value of
resumefor the above script to re-run everything from
startwithout explicitly overriding the CLI option
origin-run-id, we can see the value chosen by Metaflow using the
origin_run_idused by the
resumein the output (the exact value for you might be different):