Document

dask 2021.10.0

register_chunk_type(type)

Notes

A dask.array.Array can contain any sufficiently "NumPy-like" array in its chunks. These are also referred to as "duck arrays" since they match the most important parts of NumPy's array API, and so, behave the same way when relying on duck typing.

However, for multiple duck array types to interoperate properly, they need to properly defer to each other in arithmetic operations and NumPy functions/ufuncs according to a well-defined type casting hierarchy ( see NEP 13 ). In an effort to maintain this hierarchy, Dask defers to all other duck array types except those in its internal registry. By default, this registry contains

numpy.ndarray
numpy.ma.MaskedArray
cupy.ndarray
sparse.SparseArray
scipy.sparse.spmatrix

This function exists to append any other types to this registry. If a type is not in this registry, and yet is a downcast type (it comes below dask.array.Array in the type casting hierarchy), a TypeError will be raised due to all operand types returning NotImplemented .

Parameters

type : type: Duck array type to be registered as a type Dask can safely wrap as a chunk and to which Dask does not defer in arithmetic operations and NumPy functions/ufuncs.

Examples

Using a mock FlaggedArray class as an example chunk type unknown to Dask with minimal duck array API:

This example is valid syntax, but we were not able to check execution

>>> import numpy.lib.mixins
... class FlaggedArray(numpy.lib.mixins.NDArrayOperatorsMixin):
...     def __init__(self, a, flag=False):
...         self.a = a
...         self.flag = flag
...     def __repr__(self):
...         return f"Flag: {self.flag}, Array: " + repr(self.a)
...     def __array__(self):
...         return np.asarray(self.a)
...     def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
...         if method == '__call__':
...             downcast_inputs = []
...             flag = False
...             for input in inputs:
...                 if isinstance(input, self.__class__):
...                     flag = flag or input.flag
...                     downcast_inputs.append(input.a)
...                 elif isinstance(input, np.ndarray):
...                     downcast_inputs.append(input)
...                 else:
...                     return NotImplemented
...             return self.__class__(ufunc(*downcast_inputs, **kwargs), flag)
...         else:
...             return NotImplemented
...     @property
...     def shape(self):
...         return self.a.shape
...     @property
...     def ndim(self):
...         return self.a.ndim
...     @property
...     def dtype(self):
...         return self.a.dtype
...     def __getitem__(self, key):
...         return type(self)(self.a[key], self.flag)
...     def __setitem__(self, key, value):
...         self.a[key] = value

Before registering FlaggedArray , both types will attempt to defer to the other:

This example is valid syntax, but we were not able to check execution

>>> import dask.array as da
... da.ones(5) - FlaggedArray(np.ones(5), True)
Traceback (most recent call last):
...
TypeError: operand type(s) all returned NotImplemented ...

However, once registered, Dask will be able to handle operations with this new type:

This example is valid syntax, but we were not able to check execution

>>> da.register_chunk_type(FlaggedArray)
... x = da.ones(5) - FlaggedArray(np.ones(5), True)
... x
dask.array<sub, shape=(5,), dtype=float64, chunksize=(5,), chunktype=dask.FlaggedArray>

This example is valid syntax, but we were not able to check execution

>>> x.compute()
Flag: True, Array: array([0., 0., 0., 0., 0.])

See :

Back References

The following pages refer to to this document either explicitly or contain code examples using this.

dask.array.chunk_types.register_chunk_type

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them

File: /dask/array/chunk_types.py#7
type: <class 'function'>
Commit: