dask 2021.10.0

ParametersReturnsBackRef
fuse(dsk, keys=None, dependencies=None, ave_width=<default>, max_width=<default>, max_height=<default>, max_depth_new_edges=<default>, rename_keys=<default>, fuse_subgraphs=<default>)

This trades parallelism opportunities for faster scheduling by making tasks less granular. It can replace fuse_linear in optimization passes.

This optimization applies to all reductions--tasks that have at most one dependent--so it may be viewed as fusing "multiple input, single output" groups of tasks into a single task. There are many parameters to fine tune the behavior, which are described below. ave_width is the natural parameter with which to compare parallelism to granularity, so it should always be specified. Reasonable values for other parameters will be determined using ave_width if necessary.

Parameters

dsk: dict :

dask graph

keys: list or set, optional :

Keys that must remain in the returned dask graph

dependencies: dict, optional :

{key: [list-of-keys]}. Must be a list to provide count of each key This optional input often comes from cull

ave_width: float (default 1) :

Upper limit for width = num_nodes / height , a good measure of parallelizability. dask.config key: optimization.fuse.ave-width

max_width: int (default infinite) :

Don't fuse if total width is greater than this. dask.config key: optimization.fuse.max-width

max_height: int or None (default None) :

Don't fuse more than this many levels. Set to None to dynamically adjust to 1.5 + ave_width * log(ave_width + 1) . dask.config key: optimization.fuse.max-height

max_depth_new_edges: int or None (default None) :

Don't fuse if new dependencies are added after this many levels. Set to None to dynamically adjust to ave_width * 1.5. dask.config key: optimization.fuse.max-depth-new-edges

rename_keys: bool or func, optional (default True) :

Whether to rename the fused keys with default_fused_keys_renamer or not. Renaming fused keys can keep the graph more understandable and comprehensive, but it comes at the cost of additional processing. If False, then the top-most key will be used. For advanced usage, a function to create the new name is also accepted. dask.config key: optimization.fuse.rename-keys

fuse_subgraphs : bool or None, optional (default None)

Whether to fuse multiple tasks into SubgraphCallable objects. Set to None to let the default optimizer of individual dask collections decide. If no collection-specific default exists, None defaults to False. dask.config key: optimization.fuse.subgraphs

Returns

dsk

output graph with keys fused

dependencies

dict mapping dependencies after fusion. Useful side effect to accelerate other downstream optimizations.

Fuse tasks that form reductions; more advanced than fuse_linear

Examples

See :

Back References

The following pages refer to to this document either explicitly or contain code examples using this.

dask.blockwise.fuse_roots dask.multiprocessing.get

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them


File: /dask/optimization.py#429
type: <class 'function'>
Commit: