Skip to main content

Running in a Notebook

To execute a flow defined in a cell, just add a NBRunner one-liner on the last line of the same cell:

Once the flow finishes successfully, it will return a Run object which you can use to inspect the results.

note

Notebook execution requires that

  • The whole flow is defined in a single cell, including all import statements that the flow requires.

  • The NBRunner call must be the last line of the cell.

  • The cell shouldn't include the if __name__ == '__main__' block at the end which is needed by command line execution.

Passing Parameters

You can set values for Parameters of a flow by passing them as keyword arguments in nbrun:

Importantly, the parameter values may be variables defined in other cells, like myalpha in the video above.

Running flows remotely

A major benefit of Metaflow is that it gives you easy access to scalable compute resources. To run a flow in the cloud instead of the notebook instance, just request cloud resources.

You can pass any command-line options to NBRunner as keyword arguments. Note any --with options are aliased as decospecs, as with is a reserved keyword in Python. For instance, NBRunner(MyFlow, decospecs=['kubernetes'] would be equal to run --with kubernetes, running the flow remotely in a Kubernetes cluster:

tip

With Metaflow, you can use powerful compute resources, like GPUs and other accelerators, to run a cell and easily get the results back in the notebook.

Non-blocking runs

The NBRunner(FlowName).nbrun() one-liner is convenient for running a flow, waiting for it to complete, and returning its results. However, especially with long-running runs, you may want to start a run in the background, monitoring its progress and outputs live while the run is executing.

You can do this with NBRunner.async_run() which leverages the non-blocking Runner API:

The key constructs that come in handy with non-blocking runs are:

  • await runner.async_run() which starts a run and returns [an ExecutingRun object].

  • async for _, line in running.stream_log('stdout') allows you to stream logs line by line.

  • running.status returns the current status of the run.

  • await running.wait() blocks until the run completes.

  • runner.cleanup() deletes any any temporary files that were created during execution.

Note that it is possible to instantiate multiple NBRunner objects in separate cells and manage many concurrent runs with the above API. For more information, see documentation for the non-blocking runner API.

note

Remember to call runner.cleanup() when you are done with a non-blocking run to remove temporary files.

Changing the working directory

By default, flows execute in a temporary directory. If you use a local configuration which saves Metaflow artifacts in the current working directory, or you access local files using relative paths, you may want to set the working directory to a specific location. Define it with the base_dir keyword argument in NBRunner: