Skip to main content

Easy Custom Reports with Card Components

Default Cards are useful during development when you need to quickly inspect artifacts produced by a task or visualize the overall structure of the flow. As your project progresses, you may want to create a custom card that highlights information specific to your project.

The easiest way to create a custom card is to use built-in components: Images, Tables, Artifacts, and Markdown text. You can construct a report with these components in Python without having to worry about HTML or styling in CSS. Rest assured that if components ever show their limits, you have an option to customize reports even further using Card Templates.

Let’s start with a simple example:

from metaflow import FlowSpec, step, card, Parameter, current
from metaflow.cards import Markdown

class GuessCardFlow(FlowSpec):

number = Parameter('number', default=3)

@card(type='blank')
@step
def start(self):
current.card.append(Markdown("# Guess my number"))
if self.number > 5:
current.card.append(Markdown("My number is **smaller** ⬇️"))
elif self.number < 5:
current.card.append(Markdown("My number is **larger** ⬆️"))
else:
current.card.append(Markdown("## Correct! 🎉"))
self.next(self.end)

@step
def end(self):
pass

if __name__ == "__main__":
GuessCardFlow()

Notice how in the @card decorator we specify type=’blank’. Instead of the Default Card, we want an empty card with no content by default. The blank card provides a nice empty canvas for custom components.

The current.card.append call adds a component in the card. Each component occupies a row in the card, so you don’t have to worry about the layout. If you run GuessCardFlow, you will see a card like below. The exact content depends on the value of the number parameter.

Currently, the following components are provided:

  • Markdown - output a block of text formatted as Markdown.
  • Table - a table of rows and columns. Each cell may include other components.
  • Image - an image, constructed from bytes.
  • Artifact - pretty-print any Python object.

The following example demonstrates all the components in action:

from metaflow import FlowSpec, step, current, card
from metaflow.cards import Markdown, Artifact, Image, Table
import requests

ROOT = 'https://upload.wikimedia.org/wikipedia/commons/'
IMAGES = {
'Mammals': {
'cat': 'b/b9/CyprusShorthair.jpg',
'bandicoot': '8/8b/Perameles_gunni.jpg',
'dog': '5/5d/Akbash_Dog_male_2016.jpg'
},
'Birds': {
'penguin': 'b/bf/Spheniscus_humboldti_20070116.jpg'
}
}

class ComponentDemoFlow(FlowSpec):

@card(type='blank')
@step
def start(self):
for section, animals in IMAGES.items():
current.card.append(Markdown('## %s' % section))
rows = []
for label, url in animals.items():
resp = requests.get(ROOT + url,
headers={'user-agent': 'metaflow-example'})
rows.append([Markdown('Animal: **%s**' % label),
Artifact(resp.headers),
Image(resp.content)])
current.card.append(Table(rows))
self.next(self.end)

@step
def end(self):
pass

if __name__ == '__main__':
ComponentDemoFlow()

The resulting card will look like this:

Notice how the Artifact component automatically truncates a large dictionary in the middle column, so you can use it to safely output even huge objects. It is also worth knowing that the Image component stores the image in the resulting HTML file itself, so you can view the card without an internet connection or even if the original image becomes unavailable.

Showing Plots

A data scientist may care more about showing data visualizations rather than photos of cats. Technically there isn’t a huge difference: You can use any existing visualization library in Python to produce plots, save the resulting image in a file or an in-memory object, and provide the contents of the file (bytes) to the Image component.

For convenience, the Image component provides a utility method, Image.from_matplotlib, that extracts bytes from a Matplotlib figure automatically. Here’s an example that uses the @conda decorator to make sure that Matplotlib is available. If you have Matplotlib and Numpy already installed in your environment, you can run the example without @conda_base.

from metaflow import FlowSpec, step, current, card, conda_base
from metaflow.cards import Image

@conda_base(python='3.8.1',
libraries={'numpy':'1.20.3', 'matplotlib':'3.4.2'})
class PlotDemoFlow(FlowSpec):

@card(type='blank')
@step
def start(self):
import matplotlib.pyplot as plt
import numpy
fig = plt.figure()
x = numpy.random.normal(0, 0.1, 100000)
y = numpy.random.normal(0, 0.1, 100000)
plt.scatter(x, y, s=0.1, color=(0.2, 0.2, 1.0, 0.2))
current.card.append(Image.from_matplotlib(fig))
self.next(self.end)

@step
def end(self):
pass

if __name__ == '__main__':
PlotDemoFlow()

The resulting card will look like this:

Note that you can click the image in the card to see a larger version of it.

Multiple Cards In a Step

You may want to produce multiple separate cards in a step. Maybe one card shows high-level business metrics that are suitable for wide distribution, while another shows technical details for debugging purposes.

When multiple cards are present, calling current.card.append is ambiguous: As such, it doesn’t know which of the many cards the component should be added to. Metaflow will show a warning if you try to do this, but it won’t crash the flow - nothing card-related should ever cause the flow to crash.

Use the id keyword argument in the @card decorator to uniquely identify each card. Then, you can refer to a specific card with the current.card[card_id].append notation. Here’s an example:

from metaflow import FlowSpec, step, current, card
from metaflow.cards import Markdown

class ManyCardsFlow(FlowSpec):

@card(type='blank', id='first')
@card(type='blank', id='second')
@step
def start(self):
current.card['first'].append(\
Markdown('# I am the first card'))
current.card['second'].append(\
Markdown('# I am the second card'))
self.next(self.end)

@step
def end(self):
pass

if __name__ == '__main__':
ManyCardsFlow()

When a task has multiple cards, the “card view” command will list all cards that are viewable for the task. You must specify which exact card you want to view:

  • If you have specified an id for the card, use the –id option to view a card corresponding to the given id. For instance, “card view –id first” to see the card corresponding to @card(id=’first’).
  • Each card has a unique hash value which is shown by “card view” and “card list”. You can execute e.g. “card view –hash 23b4e” to see a card corresponding to the given hash.

Comparing Data Across Runs

In many cases, you may want to produce a single card that characterizes the results of the whole flow. A natural way to do this is to assign a card to the end step that has access to all results produced by a run.

Besides accessing all results of a single run, you may want to access results across multiple runs and produce a card that compares the latest data to past results. Thanks to the fact that Metaflow persists and versions all results, this can be done easily: Just use the Client API to access past results.

The following example demonstrates how you can create a card that accesses all data produced by a flow at the end step, as well as compares results across historical runs.

from metaflow import FlowSpec, step, current, card, conda_base, Flow, Parameter
from metaflow.cards import Image, Table, Artifact
from itertools import islice

@conda_base(python='3.8.1',
libraries={'numpy':'1.20.3', 'matplotlib':'3.4.2'})
class CompareRunsFlow(FlowSpec):

alpha = Parameter('alpha', default=0.1)

@step
def start(self):
import numpy as np
self.x = np.linspace(-1, 2, 100)
self.y = self.alpha * np.exp(self.x)
self.next(self.end)

@card(type='blank')
@step
def end(self):
self.compare_runs()

def compare_runs(self):
import matplotlib.pyplot as plt
rows = []
fig = plt.figure()
for run in islice(Flow('CompareRunsFlow'), 3):
data = run['start'].task.data
rows.append(list(map(Artifact, (run.id,
run.created_at,
data.alpha))))
plt.plot(data.x, data.y, label=run.id)
plt.legend()
current.card.append(Table(rows,\
headers=['Run ID', 'Created', 'Alpha']))
current.card.append(Image.from_matplotlib(fig))

if __name__ == '__main__':
CompareRunsFlow()

To see the comparison in action, run the flow at least three times with varying values of the –alpha parameter. Note the following features of the flow:

  • The flow-level card is produced by a separate helper function, compare_runs. It is a good idea to separate code that produces a complex card in its own function or even in a separate module. The current.card.append call is available globally when a task is executing, so there is no need to restrict card creation in a @step function.
  • The “islice(Flow('CompareRunsFlow'), 3)” expression is used to access the latest three runs of the flow, including the currently executing one. Thanks to the namespacing functionality of Metaflow, the expression returns the latest three runs executed by you personally, i.e. in your user namespace, when you run the flow locally. In contrast, if deployed to a production environment, it returns the latest three production runs. This way, you can cleanly manage multiple versions of the project, some in development and some in production, and keep the results separate.
  • You can use any off-the-shelf libraries, like Matplotlib here, to compare, visualize, and analyze results. You can develop your own helper libraries or Card Templates which standardize the analyses and reporting that are relevant for your projects.

The resulting card will look something like below. It shows the latest three runs of the flow, the parameter supplied for each run, and a visualization that allows you to compare the runs.