pandas 1.4.2

NotesParametersReturnsBackRef
cov(self, min_periods: 'int | None' = None, ddof: 'int | None' = 1) -> 'DataFrame'

Compute the pairwise covariance among the series of a DataFrame. The returned data frame is the :None:None:`covariance matrix <https://en.wikipedia.org/wiki/Covariance_matrix>` of the columns of the DataFrame.

Both NA and null values are automatically excluded from the calculation. (See the note below about bias from missing values.) A threshold can be set for the minimum number of observations for each value created. Comparisons with observations below this threshold will be returned as NaN .

This method is generally used for the analysis of time series data to understand the relationship between different measures across time.

Notes

Returns the covariance matrix of the DataFrame's time series. The covariance is normalized by N-ddof.

For DataFrames that have Series that are missing data (assuming that data is :None:None:`missing at random <https://en.wikipedia.org/wiki/Missing_data#Missing_at_random>`) the returned covariance matrix will be an unbiased estimate of the variance and covariance between the member Series.

However, for many applications this estimate may not be acceptable because the estimate covariance matrix is not guaranteed to be positive semi-definite. This could lead to estimate correlations having absolute values which are greater than one, and/or a non-invertible covariance matrix. See :None:None:`Estimation of covariance matrices <https://en.wikipedia.org/w/index.php?title=Estimation_of_covariance_ matrices>` for more details.

Parameters

min_periods : int, optional

Minimum number of observations required per pair of columns to have a valid result.

ddof : int, default 1

Delta degrees of freedom. The divisor used in calculations is N - ddof , where N represents the number of elements.

versionadded

Returns

DataFrame

The covariance matrix of the series of the DataFrame.

Compute pairwise covariance of columns, excluding NA/null values.

See Also

Series.cov

Compute covariance with another Series.

core.window.Expanding.cov

Expanding sample covariance.

core.window.ExponentialMovingWindow.cov

Exponential weighted sample covariance.

core.window.Rolling.cov

Rolling sample covariance.

Examples

This example is valid syntax, but we were not able to check execution
>>> df = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
...  columns=['dogs', 'cats'])
... df.cov() dogs cats dogs 0.666667 -1.000000 cats -1.000000 1.666667
This example is valid syntax, but we were not able to check execution
>>> np.random.seed(42)
... df = pd.DataFrame(np.random.randn(1000, 5),
...  columns=['a', 'b', 'c', 'd', 'e'])
... df.cov() a b c d e a 0.998438 -0.020161 0.059277 -0.008943 0.014144 b -0.020161 1.059352 -0.008543 -0.024738 0.009826 c 0.059277 -0.008543 1.010670 -0.001486 -0.000271 d -0.008943 -0.024738 -0.001486 0.921297 -0.013692 e 0.014144 0.009826 -0.000271 -0.013692 0.977795

Minimum number of periods

This method also supports an optional min_periods keyword that specifies the required minimum number of non-NA observations for each column pair in order to have a valid result:

This example is valid syntax, but we were not able to check execution
>>> np.random.seed(42)
... df = pd.DataFrame(np.random.randn(20, 3),
...  columns=['a', 'b', 'c'])
... df.loc[df.index[:5], 'a'] = np.nan
... df.loc[df.index[5:10], 'b'] = np.nan
... df.cov(min_periods=12) a b c a 0.316741 NaN -0.150812 b NaN 1.248003 0.191417 c -0.150812 0.191417 0.895202
See :

Back References

The following pages refer to to this document either explicitly or contain code examples using this.

pandas.core.window.expanding.Expanding.cov pandas.core.window.ewm.ExponentialMovingWindow.cov pandas.core.window.rolling.Rolling.cov pandas.core.series.Series.cov

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them


File: /pandas/core/frame.py#9570
type: <class 'function'>
Commit: