Metaflow Sandbox

As described in Metaflow on AWS, Metaflow comes with built-in integrations to various services on AWS. The seamless integration to the cloud is a key benefit of Metaflow.

We know that setting up all the required components in AWS is not trivial. To make it easier to evaluate Metaflow on AWS, we provide a hosted sandbox environment at no cost where you can test Metaflow with your own code and data.

Only a limited number of sandboxes are available. When you sign up, you are added to a waitlist for a private sandbox. It may take anywhere between days to weeks to get access to the sandbox. We will notify you by email once your sandbox is ready. Please contact us if you have any questions about signing up or using the sandbox.

Choose Your Own Sandbox Adventure

Here are some ideas that you can try with the sandbox:

Sandbox Rules

Sandbox is a limited test environment:

  • It is solely intended for testing and evaluating Metaflow for data science use cases. It is not a production environment. It is also not a general-purpose computation platform.

  • While you can test your code with your own datasets, make sure you don’t use any data that contains confidential information, personal information, or any sensitive information.

  • By default, your access to the sandbox will expire in 7 days, after which all data in the sandbox will be permanently deleted. You may contact us by email if you need more time for evaluation.

  • There is no internet connectivity in the Sandbox. We have pre-installed most common R libraries in the Sandbox.

  • You can use up to 8 concurrent instances with cpu=8 (8 cores) and memory=30000 (30GB of RAM) using the batch decorator.

It is important that you read and agree to the Metaflow Sandbox terms of use and privacy policy before signing up.

Sign up for a Sandbox

You can sign up for a sandbox at metaflow.org/sandbox.

Here is a short screencast that walks you through the process (no audio). This screencast shows the sign up process for Metaflow Python package. It is the same process for R.

  1. After agreeing to the Terms of Use and Privacy Policy, you will need to sign up with your GitHub account. This is required so we can verify your identify to prevent abuse.

  2. You will be added to a waitlist. You can log in to metaflow.org/sandbox to see the status of your process. You can expect that the status will remain at "Waiting for the next available sandbox" for many days.

  3. You will receive an email to the address specified in your GitHub profile after your sandbox is ready for use. Note that by default the sandbox will remain active only for three days. You can contact us if you need more time for evaluation.

  4. Once the sandbox is active, you will see a long configuration token in the "Sandbox active" box. Clicking "Click to copy" will copy the text to the clipboard.

  5. In your terminal, run metaflow configure sandbox which configures Metaflow to use your personal sandbox. Paste the copied string to terminal when prompted and click enter.

  6. Write metaflow status to confirm that "metadata provider" is a long URL pointing at amazonaws.com. Metaflow is now integrated with AWS!

  7. In the screencast, a test artifact called models is added to demonstrate how artifacts are stored in S3.

  8. Run your Metaflow workflow locally as usual. All Metaflow runs will now registered to the remote Metadata service by default. All artifacts are also written to S3 by default. You may notice that execution latency is slightly higher due to this.

  9. The Sandbox also includes a private Sagemaker notebook instance. Log in to it by click "My Sagemaker notebook" at metaflow.org/sandbox.

  10. The notebook includes the metaflow package by default. However, the notebook is not tied to a specific user, so you will need to set the namespace explicitly to match your username.

  11. Since your local run was registered with the Metadata service and artifacts were automatically copied to S3, you can access the locally generated artifact, models, in a remote notebook! This demonstrates how Metaflow enables multiple people share results via S3 and the Client API.

After Sandbox Expires

All good things come to an end. After your sandbox expires, all computation is stopped automatically and data is deleted permanently. Reset your Metaflow back to the local mode with metaflow configure reset.

Hopefully the sandbox convinced you that you want to keep using Metaflow with AWS. If so, a good next step is to set up Metaflow to your own AWS account which you can use without any limitations.