Document

dask 2021.10.0

_unique_internal(ar, indices, counts, return_inverse=False)

Uses numpy.unique to find the unique values for the array chunk. Given this chunk may not represent the whole array, also take the indices and counts that are in 1-to-1 correspondence to ar and reduce them in the same fashion as ar is reduced. Namely sum any counts that correspond to the same value and take the smallest index that corresponds to the same value.

To handle the inverse mapping from the unique values to the original array, simply return a NumPy array created with arange with enough values to correspond 1-to-1 to the unique values. While there is more work needed to be done to create the full inverse mapping for the original array, this provides enough information to generate the inverse mapping in Dask.

Given Dask likes to have one array returned from functions like blockwise , some formatting is done to stuff all of the resulting arrays into one big NumPy structured array. Dask is then able to handle this object and can split it apart into the separate results on the Dask side, which then can be passed back to this function in concatenated chunks for further reduction or can be return to the user to perform other forms of analysis.

By handling the problem in this way, it does not matter where a chunk is in a larger array or how big it is. The chunk can still be computed on the same way. Also it does not matter if the chunk is the result of other chunks being run through this function multiple times. The end result will still be just as accurate using this strategy.

Helper/wrapper function for numpy.unique .

Examples

See :

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them

File: /dask/array/routines.py#1541
type: <class 'function'>
Commit: