dask 2021.10.0

NotesParametersReturnsBackRef
cov(m, y=None, rowvar=1, bias=0, ddof=None)

This docstring was copied from numpy.cov.

Some inconsistencies with the Dask version may exist.

Covariance indicates the level to which two variables vary together. If we examine N-dimensional samples, $X = [x_1, x_2, ... x_N]^T$ , then the covariance matrix element $C_{ij}$ is the covariance of $x_i$ and $x_j$ . The element $C_{ii}$ is the variance of $x_i$ .

See the notes for an outline of the algorithm.

Notes

Assume that the observations are in the columns of the observation array m and let f = fweights and a = aweights for brevity. The steps to compute the weighted covariance are as follows:

>>> m = np.arange(10, dtype=np.float64)  # doctest: +SKIP
>>> f = np.arange(10) * 2  # doctest: +SKIP
>>> a = np.arange(10) ** 2.  # doctest: +SKIP
>>> ddof = 1  # doctest: +SKIP
>>> w = f * a  # doctest: +SKIP
>>> v1 = np.sum(w)  # doctest: +SKIP
>>> v2 = np.sum(w * a)  # doctest: +SKIP
>>> m -= np.sum(m * w, axis=None, keepdims=True) / v1  # doctest: +SKIP
>>> cov = np.dot(m * w, m.T) * v1 / (v1**2 - ddof * v2)  # doctest: +SKIP

Note that when a == 1 , the normalization factor v1 / (v1**2 - ddof * v2) goes over to 1 / (np.sum(f) - ddof) as it should.

Parameters

m : array_like

A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables. Also see :None:None:`rowvar` below.

y : array_like, optional

An additional set of variables and observations. y has the same form as that of m.

rowvar : bool, optional

If :None:None:`rowvar` is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.

bias : bool, optional

Default normalization (False) is by (N - 1) , where N is the number of observations given (unbiased estimate). If :None:None:`bias` is True, then normalization is by N . These values can be overridden by using the keyword ddof in numpy versions >= 1.5.

ddof : int, optional

If not None the default value implied by :None:None:`bias` is overridden. Note that ddof=1 will return the unbiased estimate, even if both :None:None:`fweights` and :None:None:`aweights` are specified, and ddof=0 will return the simple average. See the notes for the details. The default value is None .

versionadded
fweights : array_like, int, optional (Not supported in Dask)

1-D array of integer frequency weights; the number of times each observation vector should be repeated.

versionadded
aweights : array_like, optional (Not supported in Dask)

1-D array of observation vector weights. These relative weights are typically large for observations considered "important" and smaller for observations considered less "important". If ddof=0 the array of weights can be used to assign probabilities to observation vectors.

versionadded
dtype : data-type, optional (Not supported in Dask)

Data-type of the result. By default, the return data-type will have at least numpy.float64 precision.

versionadded

Returns

out : ndarray

The covariance matrix of the variables.

Estimate a covariance matrix, given data and weights.

See Also

corrcoef

Normalized covariance matrix

Examples

Consider two variables, $x_0$ and $x_1$ , which correlate perfectly, but in opposite directions:

This example is valid syntax, but we were not able to check execution
>>> x = np.array([[0, 2], [1, 1], [2, 0]]).T  # doctest: +SKIP
... x # doctest: +SKIP array([[0, 1, 2], [2, 1, 0]])

Note how $x_0$ increases while $x_1$ decreases. The covariance matrix shows this clearly:

This example is valid syntax, but we were not able to check execution
>>> np.cov(x)  # doctest: +SKIP
array([[ 1., -1.],
       [-1.,  1.]])

Note that element $C_{0,1}$ , which shows the correlation between $x_0$ and $x_1$ , is negative.

Further, note how :None:None:`x` and y are combined:

This example is valid syntax, but we were not able to check execution
>>> x = [-2.1, -1,  4.3]  # doctest: +SKIP
... y = [3, 1.1, 0.12] # doctest: +SKIP
... X = np.stack((x, y), axis=0) # doctest: +SKIP
... np.cov(X) # doctest: +SKIP array([[11.71 , -4.286 ], # may vary [-4.286 , 2.144133]])
This example is valid syntax, but we were not able to check execution
>>> np.cov(x, y)  # doctest: +SKIP
array([[11.71      , -4.286     ], # may vary
       [-4.286     ,  2.144133]])
This example is valid syntax, but we were not able to check execution
>>> np.cov(x)  # doctest: +SKIP
array(11.71)
See :

Back References

The following pages refer to to this document either explicitly or contain code examples using this.

dask.array.routines.corrcoef

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them


File: /dask/array/routines.py#1470
type: <class 'function'>
Commit: