As described in Metaflow on AWS, Metaflow comes with built-in integrations to various services on AWS. The seamless integration to the cloud is a key benefit of Metaflow.
We know that setting up all the required components in AWS is not trivial. To make it easier to evaluate Metaflow on AWS, we provide a hosted sandbox environment at no cost where you can test Metaflow with your own code and data.
Only a limited number of sandboxes are available. When you sign up, you are added to a waitlist for a private sandbox. It may take anywhere between days to weeks to get access to the sandbox. We will notify you by email once your sandbox is ready. Please contact us if you have any questions about signing up or using the sandbox.
Here are some ideas that you can try with the sandbox:
The season 2 of tutorials focuses on scaling out. This is a good way to get started. Note that the Season 1 tutorials work with the Sandbox too, when executed
You have up to 64 CPU cores at your disposal using the
@batch decorator. Test some number crunching! You can run everything in the cloud simply by using
--with batch or you can mix local and remote steps by adding
@batch to select steps.
Test how you can
resume tasks locally which were originally run remotely using
Sandbox is a limited test environment:
It is solely intended for testing and evaluating Metaflow for data science use cases. It is not a production environment. It is also not a general-purpose computation platform.
While you can test your code with your own datasets, make sure you don’t use any data that contains confidential information, personal information, or any sensitive information.
By default, your access to the sandbox will expire in 7 days, after which all data in the sandbox will be permanently deleted. You may contact us by email if you need more time for evaluation.
You can use up to 8 concurrent instances with
cpu=8 (8 cores) and
memory=30000 (30GB of RAM) using the
You can sign up for a sandbox at metaflow.org/sandbox.
Here is a short screencast that walks you through the process (no audio):
You will be added to a waitlist. You can log in to metaflow.org/sandbox to see the status of your process. You can expect that the status will remain at "Waiting for the next available sandbox" for many days.
You will receive an email to the address specified in your GitHub profile after your sandbox is ready for use. Note that by default the sandbox will remain active only for three days. You can contact us if you need more time for evaluation.
Once the sandbox is active, you will see a long configuration token in the "Sandbox active" box. Clicking "Click to copy" will copy the text to the clipboard.
In your terminal, run
metaflow configure sandbox which configures Metaflow to use your personal sandbox. Paste the copied string to terminal when prompted and click enter.
metaflow status to confirm that "metadata provider" is a long URL pointing at
amazonaws.com. Metaflow is now integrated with AWS!
In the screencast, a test artifact called
self.models is added to demonstrate how artifacts are stored in S3.
Run your Metaflow workflow locally as usual. All Metaflow runs will now registered to the remote Metadata service by default. All artifacts are also written to S3 by default. You may notice that execution latency is slightly higher due to this.
The notebook includes the
metaflow package by default. However, the notebook is not tied to a specific user, so you will need to set the namespace explicitly to match your username.
Since your local run was registered with the Metadata service and artifacts were automatically copied to S3, you can access the locally generated artifact,
models, in a remote notebook! This demonstrates how Metaflow enables multiple people share results via S3 and the Client API.
All good things come to an end. After your sandbox expires, all computation is stopped automatically and data is deleted permanently. Reset your Metaflow back to the local mode with
metaflow configure reset.
Hopefully the sandbox convinced you that you want to keep using Metaflow with AWS. If so, a good next step is to set up Metaflow to your own AWS account which you can use without any limitations.