Episode 6: Statistics Redux
Computing in the Cloud.
This example revisits Episode 02-statistics: Is this Data Science?. With Metaflow, you don't need to make any code changes to scale up your flow by running on remote compute. In this example, we re-run the stats.py
workflow adding the --with batch
command line argument. This instructs Metaflow to run all your steps in the cloud without changing any code. You can control the behavior with additional arguments, like --max-workers
. For this example, max-workers
is used to limit the number of parallel genre-specific statistics computations. You can then access the data artifacts (even the local CSV file) from anywhere because the data is being stored in the cloud-based datastore.
This tutorial uses pandas
which may not be available in your environment. Use the 'conda' package manager with the conda-forge
channel added to run this tutorial in any environment
You can find the tutorial code on GitHub
Showcasing:
--with batch
command line option--max-workers
command line option- Accessing data locally or remotely
Before playing this episode:
python -m pip install pandas
python -m pip install notebook
python -m pip install matplotlib
- This tutorial requires access to compute and storage resources on in the cloud, which can be configured by
- This tutorial requires the
conda
package manager to be installed with the conda-forge channel added.- Download Miniconda at https://docs.conda.io/en/latest/miniconda.html
conda config --add channels conda-forge
To play this episode:
cd metaflow-tutorials
python 02-statistics/stats.py --environment conda run --with batch --max-workers 4 --with conda:python=3.7,libraries="{pandas:0.24.2}"
jupyter-notebook 06-statistics-redux/stats.ipynb
- Open stats.ipynb in your remote Sagemaker notebook
Note for Python 2.7 users: when opening the stats.ipynb in a Sagemaker notebook you will need to change the python kernel by clicking Kernel -> Change Kernel -> conda_python2 from the pull down menu. This ensures the Pandas dataframe will deserialize correctly.