Installing Drivers and Frameworks
Paradoxically, often the hardest part of using an hardware accelerator is to get all the necessary software installed, such as CUDA drivers and platform-specific ML/AI frameworks.
Metaflow allows you to specify software dependencies as a part of the flow.
You can either use a Docker image with necessary dependenices included, or layer them on top of
a generic image on the fly using @conda
or @pypi
decorators. We cover both the approaches below.
Using a GPU-ready Docker image
You can use the image
argument in @batch
and @kubernetes
decorators to choose a suitable
image on the fly, like an official pytorch
image
we use below:
from metaflow import FlowSpec, step, kubernetes
class GPUImageFlow(FlowSpec):
@kubernetes(
gpu=1,
image='pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime'
)
@step
def start(self):
import torch # pylint: disable=import-error
if torch.cuda.is_available():
print('Cuda found 🙌')
for d in range(torch.cuda.device_count()):
print(f"GPU device {d}:", torch.cuda.get_device_name(d))
else:
print('No CUDA 😭')
self.next(self.end)
@step
def end(self):
pass
if __name__ == '__main__':
GPUImageFlow()
If you want to avoid spec an image in the code, you can configure a default image in your Metaflow
configuration file through the
METAFLOW_KUBERNETES_CONTAINER_IMAGE
and METAFLOW_BATCH_CONTAINER_IMAGE
settings.
Many GPU-ready images are available online, e.g. at:
- Nvidia's NVCR catalogs.
- PyTorch DockerHub Registry.
- TensorFlow DockerHub Registry.
- AWS' registry of Docker images for deep learning
You can also build a Docker image of your own, using a GPU-ready image as a base image.
Installing libraries with @conda
and @pypi
The @conda
and @pypi
decorators allow you to install
packages on the fly on top of a default image. This makes it easy to test different libraries
quickly without having to build custom images.
The CUDA drivers are hosted at NVIDIA's official Conda channel. Run this command once to include the channel in your environment:
conda config --add channels nvidia
After this, you can install PyTorch and other CUDA-enabled libraries with @conda
and
@conda_base
as usual. Try this:
from metaflow import FlowSpec, step, resources, conda_base
@conda_base(
libraries={
"pytorch::pytorch": "2.0.1",
"pytorch::pytorch-cuda": "11.8"
},
python="3.9"
)
class GPUCondaFlow(FlowSpec):
@resources(gpu=1)
@step
def start(self):
import torch # pylint: disable=import-error
if torch.cuda.is_available():
print('Cuda found 🙌')
for d in range(torch.cuda.device_count()):
print(f"GPU device {d}:", torch.cuda.get_device_name(d))
else:
print('No CUDA 😭')
self.next(self.end)
@step
def end(self):
pass
if __name__ == '__main__':
GPUCondaFlow()
Run the flow as
python gpuconda.py run --with batch
or --with kubernetes
. When you run the flow for the first time, it will create an
execution environment and cache it, which will take a few minutes. Subsequent runs will
start faster.
If you run workflows from a machine with a different operating system
than where remote tasks run, for example launching Metaflow runs that have remote
@kubernetes
tasks from a Mac, some available dependencies and versions may not be
the same for each operating system. In this case, you can go to
the conda-forge website and find
which package versions are available across each platform.