Document

dask 2021.10.0

zipf(self, a, size=None, chunks='auto', **kwargs)

This docstring was copied from numpy.random.mtrand.RandomState.zipf.

Some inconsistencies with the Dask version may exist.

Samples are drawn from a Zipf distribution with specified parameter a > 1.

The Zipf distribution (also known as the zeta distribution) is a discrete probability distribution that satisfies Zipf's law: the frequency of an item is inversely proportional to its rank in a frequency table.

note

New code should use the zipf method of a default_rng() instance instead; please see the :None:ref:`random-quick-start`.

Notes

The probability density for the Zipf distribution is

$$p(k) = \frac{k^{-a}}{\zeta(a)},$$

for integers $k \geq 1$ , where $\zeta$ is the Riemann Zeta function.

It is named for the American linguist George Kingsley Zipf, who noted that the frequency of any word in a sample of a language is inversely proportional to its rank in the frequency table.

Parameters

a : float or array_like of floats: Distribution parameter. Must be greater than 1.
size : int or tuple of ints, optional: Output shape. If the given shape is, e.g., (m, n, k) , then m * n * k samples are drawn. If size is None (default), a single value is returned if a is a scalar. Otherwise, np.array(a).size samples are drawn.

Returns

out : ndarray or scalar: Drawn samples from the parameterized Zipf distribution.

Draw samples from a Zipf distribution.

Examples

Draw samples from the distribution:

This example is valid syntax, but we were not able to check execution

>>> a = 4.0  # doctest: +SKIP
... n = 20000  # doctest: +SKIP
... s = np.random.zipf(a, n)  # doctest: +SKIP

Display the histogram of the samples, along with the expected histogram based on the probability density function:

This example is valid syntax, but we were not able to check execution

>>> import matplotlib.pyplot as plt  # doctest: +SKIP
... from scipy.special import zeta  # doctest: +SKIP

bincount provides a fast histogram for small integers.

This example is valid syntax, but we were not able to check execution

>>> count = np.bincount(s)  # doctest: +SKIP
... k = np.arange(1, s.max() + 1)  # doctest: +SKIP

This example is valid syntax, but we were not able to check execution

>>> plt.bar(k, count[1:], alpha=0.5, label='sample count')  # doctest: +SKIP
... plt.plot(k, n*(k**-a)/zeta(a), 'k.-', alpha=0.5,  # doctest: +SKIP
...          label='expected count')   # doctest: +SKIP
... plt.semilogy()  # doctest: +SKIP
... plt.grid(alpha=0.4)  # doctest: +SKIP
... plt.legend()  # doctest: +SKIP
... plt.title(f'Zipf sample, a={a}, size={n}')  # doctest: +SKIP
... plt.show()  # doctest: +SKIP

See :

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them

File: /dask/array/random.py#445
type: <class 'function'>
Commit:

Notes

Parameters

Returns

See Also

Examples

Local connectivity graph