As described in Metaflow on AWS, Metaflow comes with built-in integrations to various services on AWS. The seamless integration to the cloud is a key benefit of Metaflow.
We know that setting up all the required components in AWS is not trivial. To make it easier to evaluate Metaflow on AWS, we provide a hosted sandbox environment at no cost where you can test Metaflow with your own code and data.
Only a limited number of sandboxes are available. When you sign up, you are added to a waitlist for a private sandbox. It may take anywhere between days to weeks to get access to the sandbox. We will notify you by email once your sandbox is ready. Please contact us if you have any questions about signing up or using the sandbox.
These sandboxes support scaling up and out in the cloud through AWS Batch and production deployments via AWS Step Functions. If you are interested in a Kubernetes-native sandbox, please reach out to us directly.
Choose Your Own Sandbox Adventure
Here are some ideas that you can try with the sandbox:
- The season 2 of tutorials focuses on scaling out. This is a good way to get started. Note that the Season 1 tutorials work with the Sandbox too, when executed
- You have up to 64 CPU cores at your disposal using the
@batchdecorator. Test some number crunching! You can run everything in the cloud simply by using
--with batchor you can mix local and remote steps by adding
@batchto select steps.
- Test your code with your own data in the cloud using
IncludeFile. You can also upload data to your private S3 bucket using Metaflow's high-performance S3 client.
- Test your favorite ML libraries in the cloud using
@condadecorator. For instance, try a basic hyperparameter search using a custom parameter grid and foreach.
- Schedule your flows on AWS Step Functions and execute them using a time-based trigger.
- Evaluate Metaflow's experiment tracking and versioning using local runs and the Client API in a local notebook. In contrast to the local mode, all runs are registered globally in the Metaflow Service regardless of the directory where you run them.
- Test how you can
resumetasks locally which were originally run remotely using
Sandbox is a limited test environment:
- It is solely intended for testing and evaluating Metaflow for data science use cases. It is not a production environment. It is also not a general-purpose computation platform.
- While you can test your code with your own datasets, make sure you don’t use any data that contains confidential information, personal information, or any sensitive information.
- By default, your access to the sandbox will expire in 7 days, after which all data in the sandbox will be permanently deleted. You may contact us by email if you need more time for evaluation.
- There is no internet connectivity in the sandbox. However, you can still use 3rd party libraries through Metaflow's
@condadecorator. You can include your own data sets using
- You can use up to 8 concurrent instances with
cpu=8(8 cores) and
memory=30000(30GB of RAM) using the
Sign up for a Sandbox
You can sign up for a sandbox at metaflow.org/sandbox.
- You will be added to a waitlist. You can log in to metaflow.org/sandbox to see the status of your process. You can expect that the status will remain at "Waiting for the next available sandbox" for many days.
- You will receive an email to the address specified in your GitHub profile after your sandbox is ready for use. Note that by default the sandbox will remain active only for three days. You can contact us if you need more time for evaluation.
- Once the sandbox is active, you will see a long configuration token in the "Sandbox active" box. Clicking "Click to copy" will copy the text to the clipboard.
- In your terminal, run
metaflow configure sandboxwhich configures Metaflow to use your personal sandbox. Paste the copied string to terminal when prompted and click enter.
metaflow statusto confirm that "metadata provider" is a long URL pointing at
amazonaws.com. Metaflow is now integrated with AWS!
- In the screencast, a test artifact called
self.modelsis added to demonstrate how artifacts are stored in S3.
- Run your Metaflow workflow locally as usual. All Metaflow runs will now registered to the remote Metadata service by default. All artifacts are also written to S3 by default. You may notice that execution latency is slightly higher due to this.
- The Sandbox also includes a private Sagemaker notebook instance. Log in to it by click "My Sagemaker notebook" at metaflow.org/sandbox.
- The notebook includes the
metaflowpackage by default. However, the notebook is not tied to a specific user, so you will need to set the namespace explicitly to match your username.
- Since your local run was registered with the Metadata service and artifacts were automatically copied to S3, you can access the locally generated artifact,
models, in a remote notebook! This demonstrates how Metaflow enables multiple people share results via S3 and the Client API.
After Sandbox Expires
All good things come to an end. After your sandbox expires, all computation is stopped automatically and data is deleted permanently. Reset your Metaflow back to the local mode with
metaflow configure reset.
Hopefully the sandbox convinced you that you want to keep using Metaflow with AWS. If so, a good next step is to set up Metaflow to your own AWS account which you can use without any limitations.