test/core, generates and executes synthetic Metaflow flows, exercising all aspects of Metaflow. The test suite is executed using tox as configured in
tox.ini. You can run the tests by hand using
run_tests.pyas described below.
python helloworld.py run? The execution involves multiple layers of the Metaflow stack. The stack looks like following, starting from the most fundamental layer all the way to the user interface:
@catchtag (3) inside a deeply nested foreach graph (4) might not be returned correctly in the client API (6) when using Python 3 (1).
coredirectory tries to surface bugs like this by generating test cases automatically using specifications provided by the developer.
contexts.json(layers 1 and 2).
MetaflowTesttemplates, stored in the
testsdirectory (layers 3 and 5).
graphsdirectory (layer 4).
MetaflowCheckclasses, stored in the
metaflow_testdirectory (layer 6). You can customize which checkers get used in which contexts in
checkersand generates a test flow for every combination of them, unless you explicitly set constraints on what combinations are allowed. The test flows are then executed, optionally in parallel, and results are collected and summarized.
contexts.json. The file should be pretty self-explanatory. Most likely you do not need to edit the file unless you are adding tests for a new command-line argument.
disabled: true. These contexts are not executed by default when tests are run by a CI system. You can enable them on the command line for local testing, as shown below.
tests/basic_artifact.py. This test verifies that artifacts defined in the first step are available in all steps downstream. You can use this simple test as a template for new tests.
MetaflowTest. The class variable
PRIORITYdenotes how fundamental the exercised functionality is to Metaflow. The tests are executed in the ascending order of priority, to make sure that foundations are solid before proceeding to more sophisticated cases.
@stepsdecorator. Note that in contrast to normal Metaflow flows, these functions can be applied to multiple steps in a graph. A core idea behind this test harness is to decouple graphs from step functions, so various combinations can be tested automatically. Hence, you need to provide step functions that can be applied to various step types.
@stepsdecorator takes two arguments. The first argument is an integer that defines the order of precedence between multiple
stepsfunctions, in case multiple step function templates match. A typical pattern is to provide a specific function for a specific step type, such as joins and give it a precedence of
0. Then another catch-all can be defined with
@steps(2, ['all']). As the result, the special function is applied to joins and the catch all function for all other steps.
linearwhich match to the corresponding step types. In addition to these built-in qualifiers, graphs can specify any custom qualifiers.
required=Trueas a keyword argument to
@steps, you can require that a certain step function needs to be used in combination with a graph to produce a valid test case. By creating a custom qualifier and setting
required=Trueyou can control how tests get matched to graphs.
required=True. This way you cast a wide net to catch bugs with many generated test cases. However, if the test is slow to execute and/or does not benefit from a large number of matching graphs, it is a good idea to make it more specific.
assert_equals(expected, got)inside step functions to confirm that data inside the step functions is valid. Secondly, you can define a method
check_results(self, flow, checker)in your test class, which verifies the stored results after the flow has been executed successfully.
testsdirectory to get an idea how this works in practice.
graphsdirectory to get an idea of the syntax.
cli_check.py, and the Python API, defined in
MetaflowCheckbase class and corresponding implementations in
cli_check.py. If certain functionality is only available in one of the interfaces, you can provide a stub implementation returning
Truein the other checker class.
run_tests.py. By default, it executes all valid combinations of contexts, tests, graphs, and checkers. This mode is suitable for automated tests run by a CI system.
dev_localcontext, which does not depend on any over-the-network communication like
--debugflag makes the harness fail fast when the first test case fails. The default mode is to run all test cases and summarize all failures in the end.
coveragepackage in Python to produce a test coverage report. By default, you can find a comprehensive test coverage report in the
coveragedirectory after the test harness has finished.