Metaflow Sandbox
As described in Metaflow on AWS, Metaflow comes with built-in integrations to various services on AWS. The seamless integration to the cloud is a key benefit of Metaflow.
We know that setting up all the required components in AWS is not trivial. To make it easier to evaluate Metaflow on AWS, we provide a hosted sandbox environment at no cost where you can test Metaflow with your own code and data.
Only a limited number of sandboxes are available. When you sign up, you are added to a waitlist for a private sandbox. It may take anywhere between days to weeks to get access to the sandbox. We will notify you by email once your sandbox is ready. Please contact us if you have any questions about signing up or using the sandbox.
Choose Your Own Sandbox Adventure
Here are some ideas that you can try with the sandbox:
- The season 2 of tutorials
focuses on scaling out. This is a good way to get started. Note that the Season 1
tutorials work with the Sandbox too, when executed using the
batch
decorator. - You have up to 64 CPU cores at your disposal using the
batch
decorator. Test some number crunching! You can run everything in the cloud simply by or you can mix local and remote steps by addingdecorator("batch",...)
to select steps. - Test your favorite ML libraries in the cloud using
batch
decorator. For instance, try a basic hyperparameter search using a custom parameter grid and foreach. - Evaluate Metaflow's experiment tracking and versioning using local runs and the Client API in a local notebook. In contrast to the local mode, all runs are registered globally in the Metaflow Service regardless of the directory where you run them.
- Test how you can
resume
tasks locally which were originally run remotely using thebatch
decorator.
Sandbox Rules
Sandbox is a limited test environment:
- It is solely intended for testing and evaluating Metaflow for data science use cases. It is not a production environment. It is also not a general-purpose computation platform.
- While you can test your code with your own datasets, make sure you don’t use any data that contains confidential information, personal information, or any sensitive information.
- By default, your access to the sandbox will expire in 7 days, after which all data in the sandbox will be permanently deleted. You may contact us by email if you need more time for evaluation.
- There is no internet connectivity in the Sandbox. We have pre-installed most common R libraries in the Sandbox.
- You can use up to 8 concurrent instances with
cpu=8
(8 cores) andmemory=30000
(30GB of RAM) using thebatch
decorator.
It is important that you read and agree to the Metaflow Sandbox terms of use and privacy policy before signing up.
Sign up for a Sandbox
You can sign up for a sandbox at metaflow.org/sandbox.
- After agreeing to the Terms of Use and Privacy Policy, you will need to sign up with your GitHub account. This is required so that we can verify your identify to prevent abuse.
- You will be added to a waitlist. You can log in to metaflow.org/sandbox to see the status of your process. You can expect that the status will remain at "Waiting for the next available sandbox" for many days.
- You will receive an email to the address specified in your GitHub profile after your sandbox is ready for use. Note that by default the sandbox will remain active only for three days. You can contact us if you need more time for evaluation.
- Once the sandbox is active, you will see a long configuration token in the "Sandbox active" box. Clicking "Click to copy" will copy the text to the clipboard.
- In your terminal, run
metaflow configure sandbox
which configures Metaflow to use your personal sandbox. Paste the copied string to terminal when prompted and click enter. - Write
metaflow status
to confirm that "metadata provider" is a long URL pointing atamazonaws.com
. Metaflow is now integrated with AWS! - In the screencast, a test artifact called
models
is added to demonstrate how artifacts are stored in S3. - Run your Metaflow workflow locally as usual. All Metaflow runs will now registered to the remote Metadata service by default. All artifacts are also written to S3 by default. You may notice that execution latency is slightly higher due to this.
- The Sandbox also includes a private Sagemaker notebook instance. Log in to it by click "My Sagemaker notebook" at metaflow.org/sandbox.
- The notebook includes the
metaflow
package by default. However, the notebook is not tied to a specific user, so you will need to set the namespace explicitly to match your username. - Since your local run was registered with the Metadata service and artifacts were
automatically copied to S3, you can access the locally generated artifact,
models
, in a remote notebook! This demonstrates how Metaflow enables multiple people share results via S3 and the Client API.
After Sandbox Expires
All good things come to an end. After your sandbox expires, all computation is stopped
automatically and data is deleted permanently. Reset your Metaflow back to the local
mode with metaflow configure reset
.
Hopefully the sandbox convinced you that you want to keep using Metaflow with AWS. If so, a good next step is to set up Metaflow to your own AWS account which you can use without any limitations.