big_matrix. The matrix requires about 80000^2 * 8 bytes = 48GB of memory.
MemoryErroron most laptops as we are unable to allocate 48GB of memory.
resourcesdecorator suggests resource requirements for a step. The
memoryargument specifies the amount of RAM in megabytes and
cputhe number of CPU cores requested. It does not produce the resources magically, which is why the run above failed.
--with batchoption instructs Metaflow to run all tasks as separate AWS Batch jobs, instead of using a local process for each task. It has the same effect as adding
@batchdecorator to all steps in the code.
resourcesdecorator is used as a prescription for the size of the box that Batch should run the job on; please be sure that this resource requirement can be met. See here on what can happen if this is not the case.
memoryyou can specify
gpu=Nto request N GPUs for the instance.
batch. It takes exactly the same keyword arguments as
resourcesbut instead of being a mere suggestion, it forces the step to be run on AWS Batch.
batchis that you can selectively run some steps locally and some on AWS Batch. In the example above, try replacing
batchand run it as follows:
startstep gets executed on a large AWS Batch instance but the
endstep, which does not need special resources, is executed locally without the additional latency of launching a AWS Batch job. Executing a
foreachstep launches parallel AWS Batch jobs with the specified resources for the step.
--with batch, branches are mapped to separate AWS Batch jobs that are executed in parallel. All this makes sense for basic use cases. What if you want to utilize multiple cores on an AWS Batch instance?
parallel_mapas an answer. This function is almost equivalent to
Pool().mapin the Python's built-in multiprocessing library. The main differences are the following:
parallel_mapsupports lambdas and any other callables of Python.
parallel_mapdoes not suffer from bugs present in
parallel_mapcan handle larger amounts of data.
parallel_mapmay be appropriate for simple operations that might be too cumbersome to implement as separate steps.
sum()by partitioning the matrix by row:
cpu=8to request eight CPU cores from AWS Batch, so our
parallel_mapcan benefit from optimal parallelism.
sumis not faster than the original simple implementation due to the overhead of launching separate processes in
parallel_map. A less trivial operation might see a much larger performance boost.
@timeoutshould I set?
@timeoutplease read this.
@resourcescan I request?
memory: 4000 (4GB)
@resources, keep in mind the configuration of your AWS Batch Compute Environment. Your job will be stuck in a
RUNNABLEstate if AWS is unable to provision the requested resources. Additionally, as a good measure, don't request more resources than what your workflow actually needs. On the other hand, never optimize resources prematurely.
queueargument. By default, all tasks execute on a vanilla python docker image corresponding to the version of Python interpreter used to launch the flow and can be overridden using the
RUNNABLEstate. What do I do?
batch killcommands. These commands are disabled in the Metaflow AWS Sandbox.
killkills tasks only related to the flow you called
--with batch, this code would launch 1000 parallel Batch instances which may turn out to be quite expensive.
resumecommands have a flag
--max-num-splitswhich fails the task if it attempts to launch more than 100 splits by default. Use the flag to increase the limit if you actually need more tasks.
--max-workers, limits the number of tasks run in parallel. Even if a foreach launched 100 splits,
--max-workerswould make only 16 (by default) of them run in parallel at any point in time. If you want more parallelism, increase the value of