Document

apply(self, func, *args, **kwargs)

The function passed to apply must take a dataframe as its first argument and return a DataFrame, Series or scalar. apply will then take care of combining the results back together into a single dataframe or series. apply is therefore a highly flexible grouping method.

While apply is a very flexible method, its downside is that using it can be quite a bit slower than using more specific methods like agg or transform . Pandas offers a wide range of method that will be much faster than using apply for their specific purposes, so try to use them before reaching for apply .

Notes

versionchanged

The resulting dtype will reflect the return value of the passed func , see the examples below.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

Parameters

func : callable: A callable that takes a dataframe as its first argument, and returns a dataframe, a series or a scalar. In addition the callable may take positional and keyword arguments.
args, kwargs : tuple and dict: Optional positional and keyword arguments to pass to func .

Returns

applied : Series or DataFrame

Apply function func group-wise and combine the results together.

Examples

This example is valid syntax, but we were not able to check execution

>>> df = pd.DataFrame({'A': 'a a b'.split(),
...                    'B': [1,2,3],
...                    'C': [4,6,5]})
... g = df.groupby('A')

Notice that g has two groups, a and b . Calling apply in various ways, we can get different grouping results:

Example 1: below the function passed to apply takes a DataFrame as its argument and returns a DataFrame. apply combines the result for each group together into a new DataFrame:

This example is valid syntax, but we were not able to check execution

>>> g[['B', 'C']].apply(lambda x: x / x.sum())
          B    C
0  0.333333  0.4
1  0.666667  0.6
2  1.000000  1.0

Example 2: The function passed to apply takes a DataFrame as its argument and returns a Series. apply combines the result for each group together into a new DataFrame.

versionchanged

The resulting dtype will reflect the return value of the passed func .

This example is valid syntax, but we were not able to check execution

>>> g[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())
     B    C
A
a  1.0  2.0
b  0.0  0.0

Example 3: The function passed to apply takes a DataFrame as its argument and returns a scalar. apply combines the result for each group together into a Series, including setting the index as appropriate:

This example is valid syntax, but we were not able to check execution

>>> g.apply(lambda x: x.C.max() - x.B.min())
A
a    5
b    2
dtype: int64

See :

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them

File: /pandas/core/groupby/groupby.py#1370
type: <class 'function'>
Commit:

Notes

Parameters

Returns

See Also

Examples

Local connectivity graph