compute_parallelization_schedule#
- compute_parallelization_schedule(shape1, shape2, max_cores, max_ram, matching_method, split_axes=None, backend=None, split_only_outer=False, shape1_padding=None, analyzer_method=None, max_splits=256, float_nbytes=4, complex_nbytes=8, integer_nbytes=4)[source]#
Computes a parallelization schedule for a given computation.
This function estimates the amount of memory that would be used by a computation and breaks down the computation into smaller parts that can be executed in parallel without exceeding the specified limits on the number of cores and memory.
- Parameters:
- shape1NDArray
The shape of the first input tensor.
- shape1_paddingNDArray, optional
Padding for shape1 used for each split. None by defauly
- shape2NDArray
The shape of the second input tensor.
- max_coresint
The maximum number of cores that can be used.
- max_ramint
The maximum amount of memory that can be used.
- matching_methodstr
The metric used for scoring the computations.
- split_axestuple
Axes that can be used for splitting. By default all are considered.
- backendstr, optional
Backend used for computations.
- split_only_outerbool, optional
Whether only outer splits sould be considered.
- analyzer_methodstr
The method used for score analysis.
- max_splitsint, optional
The maximum number of parts that the computation can be split into, by default 256.
- float_nbytesint
Number of bytes of the used float, e.g. 4 for float32.
- complex_nbytesint
Number of bytes of the used complex, e.g. 8 for complex64.
- integer_nbytesint
Number of bytes of the used integer, e.g. 4 for int32.
- Returns:
- dict
The optimal splits for each axis of the first input tensor.
- int
The number of outer jobs.
- int
The number of inner jobs per outer job.
Notes
This function assumes that no residual memory remains after each split, which not always holds true, e.g. when using
tme.analyzer.MaxScoreOverRotations
.