compute_parallelization_schedule#

compute_parallelization_schedule(shape1, shape2, max_cores, max_ram, matching_method, split_axes=None, backend=None, split_only_outer=False, shape1_padding=None, analyzer_method=None, max_splits=256, float_nbytes=4, complex_nbytes=8, integer_nbytes=4)[source]#

Computes a parallelization schedule for a given computation.

This function estimates the amount of memory that would be used by a computation and breaks down the computation into smaller parts that can be executed in parallel without exceeding the specified limits on the number of cores and memory.

Parameters:
shape1NDArray

The shape of the first input tensor.

shape1_paddingNDArray, optional

Padding for shape1 used for each split. None by defauly

shape2NDArray

The shape of the second input tensor.

max_coresint

The maximum number of cores that can be used.

max_ramint

The maximum amount of memory that can be used.

matching_methodstr

The metric used for scoring the computations.

split_axestuple

Axes that can be used for splitting. By default all are considered.

backendstr, optional

Backend used for computations.

split_only_outerbool, optional

Whether only outer splits sould be considered.

analyzer_methodstr

The method used for score analysis.

max_splitsint, optional

The maximum number of parts that the computation can be split into, by default 256.

float_nbytesint

Number of bytes of the used float, e.g. 4 for float32.

complex_nbytesint

Number of bytes of the used complex, e.g. 8 for complex64.

integer_nbytesint

Number of bytes of the used integer, e.g. 4 for int32.

Returns:
dict

The optimal splits for each axis of the first input tensor.

int

The number of outer jobs.

int

The number of inner jobs per outer job.

Notes

This function assumes that no residual memory remains after each split, which not always holds true, e.g. when using tme.analyzer.MaxScoreOverRotations.