fuse(dsk, keys=None, dependencies=None, ave_width=<default>, max_width=<default>, max_height=<default>, max_depth_new_edges=<default>, rename_keys=<default>, fuse_subgraphs=<default>)
This trades parallelism opportunities for faster scheduling by making tasks less granular. It can replace fuse_linear
in optimization passes.
This optimization applies to all reductions--tasks that have at most one dependent--so it may be viewed as fusing "multiple input, single output" groups of tasks into a single task. There are many parameters to fine tune the behavior, which are described below. ave_width
is the natural parameter with which to compare parallelism to granularity, so it should always be specified. Reasonable values for other parameters will be determined using ave_width
if necessary.
dask graph
Keys that must remain in the returned dask graph
{key: [list-of-keys]}. Must be a list to provide count of each key This optional input often comes from cull
Upper limit for width = num_nodes / height
, a good measure of parallelizability. dask.config key: optimization.fuse.ave-width
Don't fuse if total width is greater than this. dask.config key: optimization.fuse.max-width
Don't fuse more than this many levels. Set to None to dynamically adjust to 1.5 + ave_width * log(ave_width + 1)
. dask.config key: optimization.fuse.max-height
Don't fuse if new dependencies are added after this many levels. Set to None to dynamically adjust to ave_width * 1.5. dask.config key: optimization.fuse.max-depth-new-edges
Whether to rename the fused keys with default_fused_keys_renamer
or not. Renaming fused keys can keep the graph more understandable and comprehensive, but it comes at the cost of additional processing. If False, then the top-most key will be used. For advanced usage, a function to create the new name is also accepted. dask.config key: optimization.fuse.rename-keys
Whether to fuse multiple tasks into SubgraphCallable
objects. Set to None to let the default optimizer of individual dask collections decide. If no collection-specific default exists, None defaults to False. dask.config key: optimization.fuse.subgraphs
output graph with keys fused
dict mapping dependencies after fusion. Useful side effect to accelerate other downstream optimizations.
Fuse tasks that form reductions; more advanced than fuse_linear
The following pages refer to to this document either explicitly or contain code examples using this.
dask.blockwise.fuse_roots
dask.multiprocessing.get
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them