resample(self, rule, axis=0, closed: 'str | None' = None, label: 'str | None' = None, convention: 'str' = 'start', kind: 'str | None' = None, loffset=None, base: 'int | None' = None, on=None, level=None, origin: 'str | TimestampConvertibleTypes' = 'start_day', offset: 'TimedeltaConvertibleTypes | None' = None) -> 'Resampler'
Convenience method for frequency conversion and resampling of time series. The object must have a datetime-like index (DatetimeIndex
, PeriodIndex
, or TimedeltaIndex
), or the caller must pass the label of a datetime-like series/index to the on
/ level
keyword parameter.
See the :None:None:`user guide
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling>`
for more.
To learn more about the offset strings, please see :None:None:`this link
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects>`
.
The offset string or object representing target conversion.
Which axis to use for up- or down-sampling. For Series
this will default to 0, i.e. along the rows. Must be DatetimeIndex
, TimedeltaIndex
or PeriodIndex
.
Which side of bin interval is closed. The default is 'left' for all frequency offsets except for 'M', 'A', 'Q', 'BM', 'BA', 'BQ', and 'W' which all have a default of 'right'.
Which bin edge label to label bucket with. The default is 'left' for all frequency offsets except for 'M', 'A', 'Q', 'BM', 'BA', 'BQ', and 'W' which all have a default of 'right'.
For PeriodIndex
only, controls whether to use the start or end of :None:None:`rule`
.
Pass 'timestamp' to convert the resulting index to a :None:None:`DateTimeIndex`
or 'period' to convert it to a PeriodIndex
. By default the input representation is retained.
Adjust the resampled time labels.
You should add the loffset to the :None:None:`df.index`
after the resample. See below.
For frequencies that evenly subdivide 1 day, the "origin" of the aggregated intervals. For example, for '5min' frequency, base could range from 0 through 4. Defaults to 0.
The new arguments that you should use are 'offset' or 'origin'.
For a DataFrame, column to use instead of index for resampling. Column must be datetime-like.
For a MultiIndex, level (name or number) to use for resampling. :None:None:`level`
must be datetime-like.
The timestamp on which to adjust the grouping. The timezone of origin must match the timezone of the index. If string, must be one of the following:
'epoch': origin
is 1970-01-01
'start': origin
is the first value of the timeseries
'start_day': origin
is the first day at midnight of the timeseries
'end': origin
is the last value of the timeseries
'end_day': origin
is the ceiling midnight of the last day
An offset timedelta added to the origin.
~pandas.core.Resampler
object.
Resample time-series data.
DataFrame.resample
Resample a DataFrame.
Series.resample
Resample a Series.
asfreq
Reindex a Series with the given frequency without grouping.
groupby
Group Series by mapping, function, label, or list of labels.
Start by creating a series with 9 one minute timestamps.
This example is valid syntax, but we were not able to check execution>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
... series = pd.Series(range(9), index=index)
... series 2000-01-01 00:00:00 0 2000-01-01 00:01:00 1 2000-01-01 00:02:00 2 2000-01-01 00:03:00 3 2000-01-01 00:04:00 4 2000-01-01 00:05:00 5 2000-01-01 00:06:00 6 2000-01-01 00:07:00 7 2000-01-01 00:08:00 8 Freq: T, dtype: int64
Downsample the series into 3 minute bins and sum the values of the timestamps falling into a bin.
This example is valid syntax, but we were not able to check execution>>> series.resample('3T').sum() 2000-01-01 00:00:00 3 2000-01-01 00:03:00 12 2000-01-01 00:06:00 21 Freq: 3T, dtype: int64
Downsample the series into 3 minute bins as above, but label each bin using the right edge instead of the left. Please note that the value in the bucket used as the label is not included in the bucket, which it labels. For example, in the original series the bucket 2000-01-01 00:03:00
contains the value 3, but the summed value in the resampled bucket with the label 2000-01-01 00:03:00
does not include 3 (if it did, the summed value would be 6, not 3). To include this value close the right side of the bin interval as illustrated in the example below this one.
>>> series.resample('3T', label='right').sum() 2000-01-01 00:03:00 3 2000-01-01 00:06:00 12 2000-01-01 00:09:00 21 Freq: 3T, dtype: int64
Downsample the series into 3 minute bins as above, but close the right side of the bin interval.
This example is valid syntax, but we were not able to check execution>>> series.resample('3T', label='right', closed='right').sum() 2000-01-01 00:00:00 0 2000-01-01 00:03:00 6 2000-01-01 00:06:00 15 2000-01-01 00:09:00 15 Freq: 3T, dtype: int64
Upsample the series into 30 second bins.
This example is valid syntax, but we were not able to check execution>>> series.resample('30S').asfreq()[0:5] # Select first 5 rows 2000-01-01 00:00:00 0.0 2000-01-01 00:00:30 NaN 2000-01-01 00:01:00 1.0 2000-01-01 00:01:30 NaN 2000-01-01 00:02:00 2.0 Freq: 30S, dtype: float64
Upsample the series into 30 second bins and fill the NaN
values using the pad
method.
>>> series.resample('30S').pad()[0:5] 2000-01-01 00:00:00 0 2000-01-01 00:00:30 0 2000-01-01 00:01:00 1 2000-01-01 00:01:30 1 2000-01-01 00:02:00 2 Freq: 30S, dtype: int64
Upsample the series into 30 second bins and fill the NaN
values using the bfill
method.
>>> series.resample('30S').bfill()[0:5] 2000-01-01 00:00:00 0 2000-01-01 00:00:30 1 2000-01-01 00:01:00 1 2000-01-01 00:01:30 2 2000-01-01 00:02:00 2 Freq: 30S, dtype: int64
Pass a custom function via apply
>>> def custom_resampler(arraylike):This example is valid syntax, but we were not able to check execution
... return np.sum(arraylike) + 5 ...
>>> series.resample('3T').apply(custom_resampler) 2000-01-01 00:00:00 8 2000-01-01 00:03:00 17 2000-01-01 00:06:00 26 Freq: 3T, dtype: int64
For a Series with a PeriodIndex, the keyword convention
can be used to control whether to use the start or end of :None:None:`rule`
.
Resample a year by quarter using 'start' convention
. Values are assigned to the first quarter of the period.
>>> s = pd.Series([1, 2], index=pd.period_range('2012-01-01',This example is valid syntax, but we were not able to check execution
... freq='A',
... periods=2))
... s 2012 1 2013 2 Freq: A-DEC, dtype: int64
>>> s.resample('Q', convention='start').asfreq() 2012Q1 1.0 2012Q2 NaN 2012Q3 NaN 2012Q4 NaN 2013Q1 2.0 2013Q2 NaN 2013Q3 NaN 2013Q4 NaN Freq: Q-DEC, dtype: float64
Resample quarters by month using 'end' convention
. Values are assigned to the last month of the period.
>>> q = pd.Series([1, 2, 3, 4], index=pd.period_range('2018-01-01',This example is valid syntax, but we were not able to check execution
... freq='Q',
... periods=4))
... q 2018Q1 1 2018Q2 2 2018Q3 3 2018Q4 4 Freq: Q-DEC, dtype: int64
>>> q.resample('M', convention='end').asfreq() 2018-03 1.0 2018-04 NaN 2018-05 NaN 2018-06 2.0 2018-07 NaN 2018-08 NaN 2018-09 3.0 2018-10 NaN 2018-11 NaN 2018-12 4.0 Freq: M, dtype: float64
For DataFrame objects, the keyword :None:None:`on`
can be used to specify the column instead of the index for resampling.
>>> d = {'price': [10, 11, 9, 13, 14, 18, 17, 19],This example is valid syntax, but we were not able to check execution
... 'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
... df = pd.DataFrame(d)
... df['week_starting'] = pd.date_range('01/01/2018',
... periods=8,
... freq='W')
... df price volume week_starting 0 10 50 2018-01-07 1 11 60 2018-01-14 2 9 40 2018-01-21 3 13 100 2018-01-28 4 14 50 2018-02-04 5 18 100 2018-02-11 6 17 40 2018-02-18 7 19 50 2018-02-25
>>> df.resample('M', on='week_starting').mean() price volume week_starting 2018-01-31 10.75 62.5 2018-02-28 17.00 60.0
For a DataFrame with MultiIndex, the keyword :None:None:`level`
can be used to specify on which level the resampling needs to take place.
>>> days = pd.date_range('1/1/2000', periods=4, freq='D')This example is valid syntax, but we were not able to check execution
... d2 = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
... 'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
... df2 = pd.DataFrame(
... d2,
... index=pd.MultiIndex.from_product(
... [days, ['morning', 'afternoon']]
... )
... )
... df2 price volume 2000-01-01 morning 10 50 afternoon 11 60 2000-01-02 morning 9 40 afternoon 13 100 2000-01-03 morning 14 50 afternoon 18 100 2000-01-04 morning 17 40 afternoon 19 50
>>> df2.resample('D', level=0).sum() price volume 2000-01-01 21 110 2000-01-02 22 140 2000-01-03 32 150 2000-01-04 36 90
If you want to adjust the start of the bins based on a fixed timestamp:
This example is valid syntax, but we were not able to check execution>>> start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'This example is valid syntax, but we were not able to check execution
... rng = pd.date_range(start, end, freq='7min')
... ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
... ts 2000-10-01 23:30:00 0 2000-10-01 23:37:00 3 2000-10-01 23:44:00 6 2000-10-01 23:51:00 9 2000-10-01 23:58:00 12 2000-10-02 00:05:00 15 2000-10-02 00:12:00 18 2000-10-02 00:19:00 21 2000-10-02 00:26:00 24 Freq: 7T, dtype: int64
>>> ts.resample('17min').sum() 2000-10-01 23:14:00 0 2000-10-01 23:31:00 9 2000-10-01 23:48:00 21 2000-10-02 00:05:00 54 2000-10-02 00:22:00 24 Freq: 17T, dtype: int64This example is valid syntax, but we were not able to check execution
>>> ts.resample('17min', origin='epoch').sum() 2000-10-01 23:18:00 0 2000-10-01 23:35:00 18 2000-10-01 23:52:00 27 2000-10-02 00:09:00 39 2000-10-02 00:26:00 24 Freq: 17T, dtype: int64This example is valid syntax, but we were not able to check execution
>>> ts.resample('17min', origin='2000-01-01').sum() 2000-10-01 23:24:00 3 2000-10-01 23:41:00 15 2000-10-01 23:58:00 45 2000-10-02 00:15:00 45 Freq: 17T, dtype: int64
If you want to adjust the start of the bins with an :None:None:`offset`
Timedelta, the two following lines are equivalent:
>>> ts.resample('17min', origin='start').sum() 2000-10-01 23:30:00 9 2000-10-01 23:47:00 21 2000-10-02 00:04:00 54 2000-10-02 00:21:00 24 Freq: 17T, dtype: int64This example is valid syntax, but we were not able to check execution
>>> ts.resample('17min', offset='23h30min').sum() 2000-10-01 23:30:00 9 2000-10-01 23:47:00 21 2000-10-02 00:04:00 54 2000-10-02 00:21:00 24 Freq: 17T, dtype: int64
If you want to take the largest Timestamp as the end of the bins:
This example is valid syntax, but we were not able to check execution>>> ts.resample('17min', origin='end').sum() 2000-10-01 23:35:00 0 2000-10-01 23:52:00 18 2000-10-02 00:09:00 27 2000-10-02 00:26:00 63 Freq: 17T, dtype: int64
In contrast with the :None:None:`start_day`
, you can use :None:None:`end_day`
to take the ceiling midnight of the largest Timestamp as the end of the bins and drop the bins not containing data:
>>> ts.resample('17min', origin='end_day').sum() 2000-10-01 23:38:00 3 2000-10-01 23:55:00 15 2000-10-02 00:12:00 45 2000-10-02 00:29:00 45 Freq: 17T, dtype: int64
To replace the use of the deprecated base
argument, you can now use :None:None:`offset`
, in this example it is equivalent to have :None:None:`base=2`
:
>>> ts.resample('17min', offset='2min').sum() 2000-10-01 23:16:00 0 2000-10-01 23:33:00 9 2000-10-01 23:50:00 36 2000-10-02 00:07:00 39 2000-10-02 00:24:00 24 Freq: 17T, dtype: int64
To replace the use of the deprecated :None:None:`loffset`
argument:
>>> from pandas.tseries.frequencies import to_offsetSee :
... loffset = '19min'
... ts_out = ts.resample('17min').sum()
... ts_out.index = ts_out.index + to_offset(loffset)
... ts_out 2000-10-01 23:33:00 0 2000-10-01 23:50:00 9 2000-10-02 00:07:00 21 2000-10-02 00:24:00 54 2000-10-02 00:41:00 24 Freq: 17T, dtype: int64
The following pages refer to to this document either explicitly or contain code examples using this.
pandas.core.series.Series.groupby
pandas.core.generic.NDFrame.resample
pandas.core.frame.DataFrame.resample
pandas.core.series.Series.resample
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them