histogram(a, bins=None, range=None, normed=False, weights=None, density=None)
Input data; the histogram is computed over the flattened array. If the weights
argument is used, the chunks of a
are accessed to check chunking compatibility between a
and weights
. If weights
is None
, a dask.dataframe.Series
object can be passed as input data.
Either an iterable specifying the bins
or the number of bins
and a range
argument is required as computing min
and max
over blocked arrays is an expensive operation that must be performed explicitly. If :None:None:`bins`
is an int, it defines the number of equal-width bins in the given range (10, by default). If :None:None:`bins`
is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.
The lower and upper range of the bins. If not provided, range is simply (a.min(), a.max())
. Values outside the range are ignored. The first element of the range must be less than or equal to the second. :None:None:`range`
affects the automatic bin computation as well. While bin width is computed to be optimal based on the actual data within :None:None:`range`
, the bin count will fill the entire range including portions containing no data.
This is equivalent to the density
argument, but produces incorrect results for unequal bin widths. It should not be used.
A dask.array.Array of weights, of the same block structure as a
. Each value in a
only contributes its associated weight towards the bin count (instead of 1). If density
is True, the weights are normalized, so that the integral of the density over the range remains 1.
If False
, the result will contain the number of samples in each bin. If True
, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function. Overrides the normed
keyword if given. If density
is True, bins
cannot be a single-number delayed value. It must be a concrete number, or a (possibly-delayed) array/sequence of the bin edges.
The values of the histogram. See :None:None:`density`
and :None:None:`weights`
for a description of the possible semantics.
Return the bin edges (length(hist)+1)
.
Blocked variant of numpy.histogram
.
Using number of bins and range:
This example is valid syntax, but we were not able to check execution>>> import dask.array as daThis example is valid syntax, but we were not able to check execution
... import numpy as np
... x = da.from_array(np.arange(10000), chunks=10)
... h, bins = da.histogram(x, bins=10, range=[0, 10000])
... bins array([ 0., 1000., 2000., 3000., 4000., 5000., 6000., 7000., 8000., 9000., 10000.])
>>> h.compute() array([1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000])
Explicitly specifying the bins:
This example is valid syntax, but we were not able to check execution>>> h, bins = da.histogram(x, bins=np.array([0, 5000, 10000]))This example is valid syntax, but we were not able to check execution
... bins array([ 0, 5000, 10000])
>>> h.compute() array([5000, 5000])See :
The following pages refer to to this document either explicitly or contain code examples using this.
dask.array.routines.histogram
dask.array.routines.histogram2d
dask.array.routines.bincount
dask.array.routines.histogramdd
dask.array.routines.searchsorted
dask.array.routines.digitize
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them