Read below how Metaflow has improved over time.
We take backwards compatibility very seriously. In the vast majority of cases, you can upgrade Metaflow without expecting changes in your existing code. In the rare cases when breaking changes are absolutely necessary, usually, due to bug fixes, you can take a look at minor breaking changes below before you upgrade.
The Metaflow 2.2.5 release is a minor patch release.
runtime: tag for all executions
Handle inconsistently cased file system issue when creating @conda environments on macOS for linux-64
runtime: is now available for all packaged executions and remote executions as well. This ensures that every run logged by Metaflow will have
runtime system tags available.
Conda fails to correctly set up environments for linux-64 packages on macOS at times due to inconsistently cased filesystems. Environment creation is needed to collect the necessary metadata for correctly setting up the conda environment on AWS Batch. This fix simply ignores the error-checks that conda throws while setting up the environments on macOS when the intended destination is AWS Batch.
The Metaflow 2.2.4 release is a minor patch release.
Metaflow is now compliant with AWS GovCloud & AWS CN regions
Address a bug with overriding the default value for IncludeFile
Port AWS region check for AWS DynamoDb from
AWS GovCloud & AWS CN users can now enjoy all the features of Metaflow within their region partition with no change on their end. PR: #364
Metaflow v2.1.0 introduced a bug in IncludeFile functionality which prevented users from overriding the default value specified.
Metaflow's AWS Step Functions' integration relies on AWS DynamoDb to manage foreach constructs. Metaflow was leveraging
curl at runtime to detect the region for AWS DynamoDb. Some docker images don't have
curl installed by default; moving to
requests (a metaflow dependency) fixes the issue.
The Metaflow 2.2.3 release is a minor patch release.
The previously pinned library version does not work with python 3.8. Now we have two sets of different version combinations which should work for python 2.7, 3.5, 3.6, 3.7, and 3.8. PR: #308
Previously the executable installed in conda environment was not visible inside Metaflow steps. Fixing this issue by appending conda bin path to the PATH environment variable. PR: #307
The Metaflow 2.2.2 release is a minor patch release.
Fix a regression introduced in 2.2.1 related to Conda environments
Clarify Pandas requirements for Tutorial Episode 04
Fix an issue with the metadata service
Metaflow 2.2.1 included a commit which was merged too early and broke the use of Conda. This release reverses this patch.
Recent versions of Pandas are not backward compatible with the one used in the tutorial; a small comment was added to warn of this fact.
In some cases, the metadata service would not properly create runs or tasks.
The Metaflow 2.2.1 release is a minor patch release.
include parameter to
Fix a regression introduced in 2.1 related to S3 datatools
Fix an issue where Conda execution would fail if the Conda environment was not writeable
Fix the behavior of uploading artifacts to the S3 datastore in case of retries
You can now specify the artifacts to be merged explicitly by the
merge_artifacts method as opposed to just specifying the ones that should not be merged.
Fixes the regression described in #285.
In some cases, Conda is installed system wide and the user cannot write to its installation directory. This was causing issues when trying to use the Conda environment. Fixes #179.
Retries were not properly handled when uploading artifacts to the S3 datastore. This fix addresses this issue.
The Metaflow 2.2.0 release is a minor release and introduces Metaflow's support for R lang.
Support for R lang.
This release provides an idiomatic API to access Metaflow in R lang. It piggybacks on the Pythonic implementation as the backend providing most of the functionality previously accessible to the Python community. With this release, R users can structure their code as a metaflow flow. Metaflow will snapshot the code, data, and dependencies automatically in a content-addressed datastore allowing for resuming of workflows, reproducing past results, and inspecting anything about the workflow e.g. in a notebook or RStudio IDE. Additionally, without any changes to their workflows, users can now execute code on AWS Batch and interact with Amazon S3 seamlessly.
The Metaflow 2.1.1 release is a minor patch release.
Handle race condition for
/step endpoint of metadata service.
foreach step in AWS Step Functions launches multiple AWS Batch tasks, each of which tries to register the step metadata if it already doesn't exist. This can result in a race condition and cause the task to fail. This patch properly handles the 409 response from the service.
The Metaflow 2.1.0 release is a minor release and introduces Metaflow's integration with AWS Step Functions.
Add capability to schedule Metaflow flows with AWS Step Functions.
Fix log indenting in Metaflow.
Throw exception properly if fetching code package from Amazon S3 on AWS Batch fails.
Remove millisecond information from timestamps returned by Metaflow client.
Handle CloudWatchLogs resource creation delay gracefully.
Netflix uses an internal DAG scheduler to orchestrate most machine learning and ETL pipelines in production. Metaflow users at Netflix can seamlessly deploy and schedule their flows to this scheduler. Now, with this release, we are introducing a similar integration with AWS Step Functions where Metaflow users can easily deploy & schedule their flows by simply executing
python myflow.py step-functions create
which will create an AWS Step Functions state machine for them. With this feature, Metaflow users can now enjoy all the features of Metaflow along with a highly available, scalable, maintenance-free production scheduler without any changes in their existing code.
With this integration, Metaflow users can inspect their flows deployed on AWS Step Functions as before and debug and reproduce results from AWS Step Functions on their local laptop or within a notebook.
Metaflow was inadvertently removing leading whitespace from user-visible logs on the console. Now Metaflow presents user-visible logs with the correct formatting.
Due to malformed permissions, AWS Batch might not be able to fetch the code package from Amazon S3 for user code execution. In such scenarios, it wasn't apparent to the user, where the code package was being pulled from, making triaging any permission issue a bit difficult. Now, the Amazon S3 file location is part of the exception stack trace.
time to store the
finished_at information for the
Run object returned by Metaflow client.
time unfortunately does not support the
%f directive, making it difficult to parse these fields by
time. Since Metaflow doesn't expose timings at millisecond grain, this PR drops the
When launching jobs on AWS Batch, the CloudWatchLogStream might not be immediately created (and may never be created if say we fail to pull the docker image for any reason whatsoever). Metaflow will now simply retry again next time.
The Metaflow 2.0.5 release is a minor patch release.
Fix logging of prefixes in
Increase retry count for AWS Batch logs streaming.
pylint version to
< 2.5.0 for compatibility issues.
The Metaflow 2.0.5 release is a minor patch release.
Avoid a cryptic error message when
datatools.S3._read_many_files is unsuccessful by converting
prefixes from a generator to a list.
Modify the retry behavior for log fetching on AWS Batch by adding jitters to exponential backoffs as well as reset the retry counter for every successful request.
Additionally, fail the Metaflow task when we fail to stream the task logs back to the user's terminal even if AWS Batch task succeeds.
2.5.0 would mark Metaflow's
self.next() syntax as an error. As a result,
python helloworld.py run would fail at the pylint check step unless we run with
--no-pylint. This version upper-bound is supposed to automatically downgrade
metaflow installation if
pylint==2.5.0 has been installed.
The Metaflow 2.0.4 release is a minor patch release.
Set proper thresholds for retrying
DescribeJobs API for AWS Batch
Preempt AWS Batch job log collection when the job fails to get into a
You can now use the
current singleton to access the
retry_count of your task. The first attempt of the task will have
retry_count as 0 and subsequent retries will increment the
retry_count. As an example:
@retry@stepdef my_step(self):from metaflow import currentprint("retry_count: %s" % current.retry_count)self.next(self.a)
The AWS Logs API for
get_log_events has a global hard limit on 10 requests per sec. While we have retry logic in place to respect this limit, some of the
ThrottleExceptions usually end up in the job logs causing confusion to the end-user. This release addresses this issue (also documented in #184).
The AWS Batch API for
ThrottleExceptions when managing a flow with a very wide
for-each step. This release adds retry behavior with backoffs to add proper resiliency (addresses #138).
In certain user environments, to properly isolate
conda environments, we have to explicitly override
PYTHONNOUSERSITE rather than simply relying on
python -s (addresses #178).
Fixes a bug where if the AWS Batch job crashes before entering the
RUNNING state (often due to incorrect IAM perms), the previous log collection behavior would fail to print the correct error message making it harder to debug the issue (addresses #185).
The Metaflow 2.0.3 release is a minor patch release.
Ability to specify S3 endpoint
Executing on AWS Batch
You can now use the
current singleton (documented here) to access the names of the parameters passed into your flow. As an example:
for var in current.parameter_names:print("Parameter %s has value %s" % (var, getattr(self, var))
This addresses #137.
A few issues were addressed to improve the usability of Metaflow. In particular,
show now properly respects indentation making the description of steps and flows more readable. This addresses #92. Superfluous print messages were also suppressed when executing on AWS batch with the local metadata provider (#152).
A smaller, newer and standalone Conda installer is now used resulting in faster and more reliable Conda bootstrapping (#123).
We now check for the command line
--datastore-root prior to using the environment variable
METAFLOW_DATASTORE_SYSROOT_S3 when determining the S3 root (#134). This release also fixes an issue where using the local Metadata provider with AWS batch resulted in incorrect directory structure in the
.metaflow directory (#141).
metaflow configure [import|export] for importing/exporting Metaflow configurations.
Fix a docker registry parsing bug in AWS Batch.
Fix various typos in Metaflow tutorials.
First Open Source Release.
Read the blogpost announcing the release