compute_parallelization_schedule#

compute_parallelization_schedule(shape1, shape2, max_cores, max_ram, matching_method, split_axes=None, backend=None, split_only_outer=False, shape1_padding=None, analyzer_method=None, max_splits=256, float_nbytes=4, complex_nbytes=8, integer_nbytes=4)[source]#

Computes a parallelization schedule for a given computation.

This function estimates the amount of memory that would be used by a computation and breaks down the computation into smaller parts that can be executed in parallel without exceeding the specified limits on the number of cores and memory.

Parameters:

shape1NDArray: The shape of the first input tensor.
shape1_paddingNDArray, optional: Padding for shape1 used for each split. None by defauly
shape2NDArray: The shape of the second input tensor.
max_coresint: The maximum number of cores that can be used.
max_ramint: The maximum amount of memory that can be used.
matching_methodstr: The metric used for scoring the computations.
split_axestuple: Axes that can be used for splitting. By default all are considered.
backendstr, optional: Backend used for computations.
split_only_outerbool, optional: Whether only outer splits sould be considered.
analyzer_methodstr: The method used for score analysis.
max_splitsint, optional: The maximum number of parts that the computation can be split into, by default 256.
float_nbytesint: Number of bytes of the used float, e.g. 4 for float32.
complex_nbytesint: Number of bytes of the used complex, e.g. 8 for complex64.
integer_nbytesint: Number of bytes of the used integer, e.g. 4 for int32.

Returns:

dict: The optimal splits for each axis of the first input tensor.
int: The number of outer jobs.
int: The number of inner jobs per outer job.

Notes

This function assumes that no residual memory remains after each split, which not always holds true, e.g. when using tme.analyzer.MaxScoreOverRotations.