Document

sample(self, n: 'int | None' = None, frac: 'float | None' = None, replace: 'bool' = False, weights: 'Sequence | Series | None' = None, random_state: 'RandomState | None' = None)

You can use random_state for reproducibility.

versionadded

Parameters

n : int, optional: Number of items to return for each group. Cannot be used with frac and must be no larger than the smallest group unless :None:None:`replace` is True. Default is one if frac is None.
frac : float, optional: Fraction of items to return. Cannot be used with n.
replace : bool, default False: Allow or disallow sampling of the same row more than once.
weights : list-like, optional: Default None results in equal probability weighting. If passed a list-like then values must have the same length as the underlying DataFrame or Series object and will be used as sampling probabilities after normalization within each group. Values must be non-negative with at least one positive element within each group.
random_state : int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional: If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.

versionchanged

np.random.Generator objects now accepted

Returns

Series or DataFrame: A new object of same type as caller containing items randomly sampled within each group from the caller object.

Return a random sample of items from each group.

Examples

This example is valid syntax, but we were not able to check execution

>>> df = pd.DataFrame(
...     {"a": ["red"] * 2 + ["blue"] * 2 + ["black"] * 2, "b": range(6)}
... )
... df
       a  b
0    red  0
1    red  1
2   blue  2
3   blue  3
4  black  4
5  black  5

Select one row at random for each distinct value in column a. The random_state argument can be used to guarantee reproducibility:

This example is valid syntax, but we were not able to check execution

>>> df.groupby("a").sample(n=1, random_state=1)
       a  b
4  black  4
2   blue  2
1    red  1

Set frac to sample fixed proportions rather than counts:

This example is valid syntax, but we were not able to check execution

>>> df.groupby("a")["b"].sample(frac=0.5, random_state=2)
5    5
2    2
0    0
Name: b, dtype: int64

Control sample probabilities within groups by setting weights:

This example is valid syntax, but we were not able to check execution

>>> df.groupby("a").sample(
...     n=1,
...     weights=[1, 1, 1, 0, 0, 1],
...     random_state=1,
... )
       a  b
5  black  5
2   blue  2
0    red  0

See :

Back References

The following pages refer to to this document either explicitly or contain code examples using this.

pandas.core.sample.process_sampling_size pandas.core.sample.preprocess_weights

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them

File: /pandas/core/groupby/groupby.py#3687
type: <class 'function'>
Commit:

Parameters

Returns

See Also

Examples

Back References

Local connectivity graph