dask 2021.10.0

Parameters
rechunk(x, chunks='auto', threshold=None, block_size_limit=None, balance=False)

Parameters

x: dask array :

Array to be rechunked.

chunks: int, tuple, dict or str, optional :

The new block dimensions to create. -1 indicates the full size of the corresponding dimension. Default is "auto" which automatically determines chunk sizes.

threshold: int, optional :

The graph growth factor under which we don't bother introducing an intermediate step.

block_size_limit: int, optional :

The maximum block size (in bytes) we want to produce Defaults to the configuration value array.chunk-size

balance : bool, default False

If True, try to make each chunk to be the same size.

This means balance=True will remove any small leftover chunks, so using x.rechunk(chunks=len(x) // N, balance=True) will almost certainly result in N chunks.

Convert blocks in dask array x for new chunks.

Examples

This example is valid syntax, but we were not able to check execution
>>> import dask.array as da
... x = da.ones((1000, 1000), chunks=(100, 100))

Specify uniform chunk sizes with a tuple

This example is valid syntax, but we were not able to check execution
>>> y = x.rechunk((1000, 10))

Or chunk only specific dimensions with a dictionary

This example is valid syntax, but we were not able to check execution
>>> y = x.rechunk({0: 1000})

Use the value -1 to specify that you want a single chunk along a dimension or the value "auto" to specify that dask can freely rechunk a dimension to attain blocks of a uniform block size

This example is valid syntax, but we were not able to check execution
>>> y = x.rechunk({0: -1, 1: 'auto'}, block_size_limit=1e8)

If a chunk size does not divide the dimension then rechunk will leave any unevenness to the last chunk.

This example is valid syntax, but we were not able to check execution
>>> x.rechunk(chunks=(400, -1)).chunks
((400, 400, 200), (1000,))

However if you want more balanced chunks, and don't mind Dask choosing a different chunksize for you then you can use the balance=True option.

This example is valid syntax, but we were not able to check execution
>>> x.rechunk(chunks=(400, -1), balance=True).chunks
((500, 500), (1000,))
See :

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them


File: /dask/array/rechunk.py#187
type: <class 'function'>
Commit: