Skip to main content

@batch

The @batch decorator sends a step for execution on the AWS Batch compute layer. For more information, see Executing Tasks Remotely.

Note that while @batch doesn't allow mounting arbitrary disk volumes on the fly, you can create in-memory filesystems easily with tmpfs options. For more details, see using metaflow.S3 for in-memory processing.

@batch(...)

[source]

from metaflow import batch

Specifies that this step should execute on AWS Batch.

Parameters 

cpu: int, default 1

Number of CPUs required for this step. If @resources is also present, the maximum value from all decorators is used.

gpu: int, default 0

Number of GPUs required for this step. If @resources is also present, the maximum value from all decorators is used.

memory: int, default 4096

Memory size (in MB) required for this step. If @resources is also present, the maximum value from all decorators is used.

image: str, optional, default None

Docker image to use when launching on AWS Batch. If not specified, and METAFLOW_BATCH_CONTAINER_IMAGE is specified, that image is used. If not, a default Docker image mapping to the current version of Python is used.

queue: str, default METAFLOW_BATCH_JOB_QUEUE

AWS Batch Job Queue to submit the job to.

iam_role: str, default METAFLOW_ECS_S3_ACCESS_IAM_ROLE

AWS IAM role that AWS Batch container uses to access AWS cloud resources.

execution_role: str, default METAFLOW_ECS_FARGATE_EXECUTION_ROLE

AWS IAM role that AWS Batch can use [to trigger AWS Fargate tasks] (https://docs.aws.amazon.com/batch/latest/userguide/execution-IAM-role.html).

shared_memory: int, optional, default None

The value for the size (in MiB) of the /dev/shm volume for this step. This parameter maps to the --shm-size option in Docker.

max_swap: int, optional, default None

The total amount of swap memory (in MiB) a container can use for this step. This parameter is translated to the --memory-swap option in Docker where the value is the sum of the container memory plus the max_swap value.

swappiness: int, optional, default None

This allows you to tune memory swappiness behavior for this step. A swappiness value of 0 causes swapping not to happen unless absolutely necessary. A swappiness value of 100 causes pages to be swapped very aggressively. Accepted values are whole numbers between 0 and 100.

use_tmpfs: bool, default False

This enables an explicit tmpfs mount for this step. Note that tmpfs is not available on Fargate compute environments

tmpfs_tempdir: bool, default True

sets METAFLOW_TEMPDIR to tmpfs_path if set for this step.

tmpfs_size: int, optional, default None

The value for the size (in MiB) of the tmpfs mount for this step. This parameter maps to the --tmpfs option in Docker. Defaults to 50% of the memory allocated for this step.

tmpfs_path: str, optional, default None

Path to tmpfs mount for this step. Defaults to /metaflow_temp.

inferentia: int, default 0

Number of Inferentia chips required for this step.

trainium: int, default None

Alias for inferentia. Use only one of the two.

efa: int, default 0

Number of elastic fabric adapter network devices to attach to container

ephemeral_storage: int, default None

The total amount, in GiB, of ephemeral storage to set for the task, 21-200GiB. This is only relevant for Fargate compute environments

log_driver: str, optional, default None

The log driver to use for the Amazon ECS container.

log_options: List[str], optional, default None

List of strings containing options for the chosen log driver. The configurable values depend on the log driver chosen. Validation of these options is not supported yet. Example: [awslogs-group:aws/batch/job]