pandas 1.4.2

ParametersReturnsBackRef

This specification will select a column via the key parameter, or if the level and/or axis parameters are given, a level of the index of the target object.

If :None:None:`axis` and/or :None:None:`level` are passed as keywords to both Grouper and groupby , the values passed to Grouper take precedence.

Parameters

key : str, defaults to None

Groupby key, which selects the grouping column of the target.

level : name/number, defaults to None

The level for the target index.

freq : str / frequency object, defaults to None

This will groupby the specified frequency if the target selection (via key or level) is a datetime-like object. For full specification of available frequencies, please see :None:None:`here <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`.

axis : str, int, defaults to 0

Number/name of the axis.

sort : bool, default to False

Whether to sort the resulting labels.

closed : {'left' or 'right'}

Closed end of interval. Only when :None:None:`freq` parameter is passed.

label : {'left' or 'right'}

Interval boundary to use for labeling. Only when :None:None:`freq` parameter is passed.

convention : {'start', 'end', 'e', 's'}

If grouper is PeriodIndex and :None:None:`freq` parameter is passed.

base : int, default 0

Only when :None:None:`freq` parameter is passed. For frequencies that evenly subdivide 1 day, the "origin" of the aggregated intervals. For example, for '5min' frequency, base could range from 0 through 4. Defaults to 0.

deprecated

The new arguments that you should use are 'offset' or 'origin'.

loffset : str, DateOffset, timedelta object

Only when :None:None:`freq` parameter is passed.

deprecated

loffset is only working for .resample(...) and not for Grouper (:None:issue:`28302`). However, loffset is also deprecated for .resample(...) See: :None:class:`DataFrame.resample`

origin : Timestamp or str, default 'start_day'

The timestamp on which to adjust the grouping. The timezone of origin must match the timezone of the index. If string, must be one of the following:

  • 'epoch': origin is 1970-01-01

  • 'start': origin is the first value of the timeseries

  • 'start_day': origin is the first day at midnight of the timeseries

versionadded
  • 'end': origin is the last value of the timeseries

  • 'end_day': origin is the ceiling midnight of the last day

versionadded
offset : Timedelta or str, default is None

An offset timedelta added to the origin.

versionadded
dropna : bool, default True

If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

versionadded

Returns

A specification for a groupby instruction

A Grouper allows the user to specify a groupby instruction for an object.

Examples

Syntactic sugar for df.groupby('A')

This example is valid syntax, but we were not able to check execution
>>> df = pd.DataFrame(
...  {
...  "Animal": ["Falcon", "Parrot", "Falcon", "Falcon", "Parrot"],
...  "Speed": [100, 5, 200, 300, 15],
...  }
... )
... df Animal Speed 0 Falcon 100 1 Parrot 5 2 Falcon 200 3 Falcon 300 4 Parrot 15
This example is valid syntax, but we were not able to check execution
>>> df.groupby(pd.Grouper(key="Animal")).mean()
        Speed
Animal
Falcon  200.0
Parrot   10.0

Specify a resample operation on the column 'Publish date'

This example is valid syntax, but we were not able to check execution
>>> df = pd.DataFrame(
...  {
...  "Publish date": [
...  pd.Timestamp("2000-01-02"),
...  pd.Timestamp("2000-01-02"),
...  pd.Timestamp("2000-01-09"),
...  pd.Timestamp("2000-01-16")
...  ],
...  "ID": [0, 1, 2, 3],
...  "Price": [10, 20, 30, 40]
...  }
... )
... df Publish date ID Price 0 2000-01-02 0 10 1 2000-01-02 1 20 2 2000-01-09 2 30 3 2000-01-16 3 40
This example is valid syntax, but we were not able to check execution
>>> df.groupby(pd.Grouper(key="Publish date", freq="1W")).mean()
               ID  Price
Publish date
2000-01-02    0.5   15.0
2000-01-09    2.0   30.0
2000-01-16    3.0   40.0

If you want to adjust the start of the bins based on a fixed timestamp:

This example is valid syntax, but we were not able to check execution
>>> start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'
... rng = pd.date_range(start, end, freq='7min')
... ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
... ts 2000-10-01 23:30:00 0 2000-10-01 23:37:00 3 2000-10-01 23:44:00 6 2000-10-01 23:51:00 9 2000-10-01 23:58:00 12 2000-10-02 00:05:00 15 2000-10-02 00:12:00 18 2000-10-02 00:19:00 21 2000-10-02 00:26:00 24 Freq: 7T, dtype: int64
This example is valid syntax, but we were not able to check execution
>>> ts.groupby(pd.Grouper(freq='17min')).sum()
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17T, dtype: int64
This example is valid syntax, but we were not able to check execution
>>> ts.groupby(pd.Grouper(freq='17min', origin='epoch')).sum()
2000-10-01 23:18:00     0
2000-10-01 23:35:00    18
2000-10-01 23:52:00    27
2000-10-02 00:09:00    39
2000-10-02 00:26:00    24
Freq: 17T, dtype: int64
This example is valid syntax, but we were not able to check execution
>>> ts.groupby(pd.Grouper(freq='17min', origin='2000-01-01')).sum()
2000-10-01 23:24:00     3
2000-10-01 23:41:00    15
2000-10-01 23:58:00    45
2000-10-02 00:15:00    45
Freq: 17T, dtype: int64

If you want to adjust the start of the bins with an :None:None:`offset` Timedelta, the two following lines are equivalent:

This example is valid syntax, but we were not able to check execution
>>> ts.groupby(pd.Grouper(freq='17min', origin='start')).sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17T, dtype: int64
This example is valid syntax, but we were not able to check execution
>>> ts.groupby(pd.Grouper(freq='17min', offset='23h30min')).sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17T, dtype: int64

To replace the use of the deprecated base argument, you can now use :None:None:`offset`, in this example it is equivalent to have :None:None:`base=2`:

This example is valid syntax, but we were not able to check execution
>>> ts.groupby(pd.Grouper(freq='17min', offset='2min')).sum()
2000-10-01 23:16:00     0
2000-10-01 23:33:00     9
2000-10-01 23:50:00    36
2000-10-02 00:07:00    39
2000-10-02 00:24:00    24
Freq: 17T, dtype: int64
See :

Back References

The following pages refer to to this document either explicitly or contain code examples using this.

pandas.core.groupby.grouper.Grouper pandas.core.groupby.groupby.GroupBy.resample

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them


File: /pandas/core/groupby/grouper.py#58
type: <class 'type'>
Commit: