apply(self, func, *args, **kwargs)
The function passed to apply
must take a dataframe as its first argument and return a DataFrame, Series or scalar. apply
will then take care of combining the results back together into a single dataframe or series. apply
is therefore a highly flexible grouping method.
While apply
is a very flexible method, its downside is that using it can be quite a bit slower than using more specific methods like agg
or transform
. Pandas offers a wide range of method that will be much faster than using apply
for their specific purposes, so try to use them before reaching for apply
.
The resulting dtype will reflect the return value of the passed func
, see the examples below.
Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation
for more details.
A callable that takes a dataframe as its first argument, and returns a dataframe, a series or a scalar. In addition the callable may take positional and keyword arguments.
Optional positional and keyword arguments to pass to func
.
Apply function func
group-wise and combine the results together.
DataFrame.apply
Apply a function to each row or column of a DataFrame.
Series.apply
Apply a function to a Series.
aggregate
Apply aggregate function to the GroupBy object.
pipe
Apply function to the full GroupBy object instead of to each group.
transform
Apply function column-by-column to the GroupBy object.
>>> df = pd.DataFrame({'A': 'a a b'.split(),
... 'B': [1,2,3],
... 'C': [4,6,5]})
... g = df.groupby('A')
Notice that g
has two groups, a
and b
. Calling apply
in various ways, we can get different grouping results:
Example 1: below the function passed to apply
takes a DataFrame as its argument and returns a DataFrame. apply
combines the result for each group together into a new DataFrame:
>>> g[['B', 'C']].apply(lambda x: x / x.sum()) B C 0 0.333333 0.4 1 0.666667 0.6 2 1.000000 1.0
Example 2: The function passed to apply
takes a DataFrame as its argument and returns a Series. apply
combines the result for each group together into a new DataFrame.
This example is valid syntax, but we were not able to check executionThe resulting dtype will reflect the return value of the passed
func
.
>>> g[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min()) B C A a 1.0 2.0 b 0.0 0.0
Example 3: The function passed to apply
takes a DataFrame as its argument and returns a scalar. apply
combines the result for each group together into a Series, including setting the index as appropriate:
>>> g.apply(lambda x: x.C.max() - x.B.min()) A a 5 b 2 dtype: int64See :
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them