Installing Drivers and Frameworks
Paradoxically, often the hardest part of using an hardware accelerator is to get all the necessary software installed, such as CUDA drivers and platform-specific ML/AI frameworks.
Metaflow allows you to specify software dependencies as a part of the flow.
You can either use a Docker image with necessary dependenices included, or layer them on top of
a generic image on the fly using @conda or @pypi decorators. We cover both the approaches below.
Using a GPU-ready Docker image
You can use the image argument in @batch and @kubernetes decorators to choose a suitable
image on the fly, like an official pytorch image
we use below:
from metaflow import FlowSpec, step, kubernetes
class GPUImageFlow(FlowSpec):
@kubernetes(
gpu=1,
image='pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime'
)
@step
def start(self):
import torch # pylint: disable=import-error
if torch.cuda.is_available():
print('Cuda found 🙌')
for d in range(torch.cuda.device_count()):
print(f"GPU device {d}:", torch.cuda.get_device_name(d))
else:
print('No CUDA 😭')
self.next(self.end)
@step
def end(self):
pass
if __name__ == '__main__':
GPUImageFlow()
If you want to avoid spec an image in the code, you can configure a default image in your Metaflow
configuration file through the
METAFLOW_KUBERNETES_CONTAINER_IMAGE and METAFLOW_BATCH_CONTAINER_IMAGE settings.
Many GPU-ready images are available online, e.g. at:
- Nvidia's NVCR catalogs.
- PyTorch DockerHub Registry.
- TensorFlow DockerHub Registry.
- AWS' registry of Docker images for deep learning
You can also build a Docker image of your own, using a GPU-ready image as a base image.
Installing libraries with @conda and @pypi
The @conda and @pypi decorators allow you to install
packages on the fly on top of a default image. This makes it easy to test different libraries
quickly without having to build custom images.
The CUDA drivers are hosted at NVIDIA's official Conda channel. Run this command once to include the channel in your environment:
conda config --add channels nvidia
After this, you can install PyTorch and other CUDA-enabled libraries with @conda and
@conda_base as usual. Try this:
from metaflow import FlowSpec, step, resources, conda_base
@conda_base(
libraries={
"pytorch::pytorch": "2.0.1",
"pytorch::pytorch-cuda": "11.8"
},
python="3.9"
)
class GPUCondaFlow(FlowSpec):
@resources(gpu=1)
@step
def start(self):
import torch # pylint: disable=import-error
if torch.cuda.is_available():
print('Cuda found 🙌')
for d in range(torch.cuda.device_count()):
print(f"GPU device {d}:", torch.cuda.get_device_name(d))
else:
print('No CUDA 😭')
self.next(self.end)
@step
def end(self):
pass
if __name__ == '__main__':
GPUCondaFlow()
Run the flow as
python gpuconda.py run --with batch
or --with kubernetes. When you run the flow for the first time, it will create an
execution environment and cache it, which will take a few minutes. Subsequent runs will
start faster.
If you run workflows from a machine with a different operating system
than where remote tasks run, for example launching Metaflow runs that have remote
@kubernetes tasks from a Mac, some available dependencies and versions may not be
the same for each operating system. In this case, you can go to
the conda-forge website and find
which package versions are available across each platform.