Document

pandas 1.4.2

Notes Parameters Returns BackRef

stack(self, level: 'Level' = -1, dropna: 'bool' = True)

Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe:

if the columns have a single level, the output is a Series;

if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.

Notes

The function is named by analogy with a collection of books being reorganized from being side by side on a horizontal position (the columns of the dataframe) to being stacked vertically on top of each other (in the index of the dataframe).

Reference the user guide <reshaping.stacking> for more examples.

Parameters

level : int, str, list, default -1: Level(s) to stack from the column axis onto the index axis, defined as one index or label, or a list of indices or labels.
dropna : bool, default True: Whether to drop rows in the resulting Frame/Series with missing values. Stacking a column level onto the index axis can create combinations of index and column values that are missing from the original dataframe. See Examples section.

Returns

DataFrame or Series: Stacked dataframe or series.

Stack the prescribed level(s) from columns to index.

See Also

DataFrame.pivot: Reshape dataframe from long format to wide format.

DataFrame.pivot_table: Create a spreadsheet-style pivot table as a DataFrame.

DataFrame.unstack: Unstack prescribed level(s) from index axis onto column axis.

Examples

Single level columns

This example is valid syntax, but we were not able to check execution

>>> df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]],
...                                     index=['cat', 'dog'],
...                                     columns=['weight', 'height'])

Stacking a dataframe with a single level column axis returns a Series:

This example is valid syntax, but we were not able to check execution

>>> df_single_level_cols
     weight height
cat       0      1
dog       2      3

This example is valid syntax, but we were not able to check execution

>>> df_single_level_cols.stack()
cat  weight    0
     height    1
dog  weight    2
     height    3
dtype: int64

Multi level columns: simple case

This example is valid syntax, but we were not able to check execution

>>> multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
...                                        ('weight', 'pounds')])
... df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]],
...                                     index=['cat', 'dog'],
...                                     columns=multicol1)

Stacking a dataframe with a multi-level column axis:

This example is valid syntax, but we were not able to check execution

>>> df_multi_level_cols1
     weight
         kg    pounds
cat       1        2
dog       2        4

This example is valid syntax, but we were not able to check execution

>>> df_multi_level_cols1.stack()
            weight
cat kg           1
    pounds       2
dog kg           2
    pounds       4

Missing values

This example is valid syntax, but we were not able to check execution

>>> multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'),
...                                        ('height', 'm')])
... df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]],
...                                     index=['cat', 'dog'],
...                                     columns=multicol2)

It is common to have missing values when stacking a dataframe with multi-level columns, as the stacked dataframe typically has more values than the original dataframe. Missing values are filled with NaNs:

This example is valid syntax, but we were not able to check execution

>>> df_multi_level_cols2
    weight height
        kg      m
cat    1.0    2.0
dog    3.0    4.0

This example is valid syntax, but we were not able to check execution

>>> df_multi_level_cols2.stack()
        height  weight
cat kg     NaN     1.0
    m      2.0     NaN
dog kg     NaN     3.0
    m      4.0     NaN

Prescribing the level(s) to be stacked

The first parameter controls which level or levels are stacked:

This example is valid syntax, but we were not able to check execution

>>> df_multi_level_cols2.stack(0)
             kg    m
cat height  NaN  2.0
    weight  1.0  NaN
dog height  NaN  4.0
    weight  3.0  NaN

This example is valid syntax, but we were not able to check execution

>>> df_multi_level_cols2.stack([0, 1])
cat  height  m     2.0
     weight  kg    1.0
dog  height  m     4.0
     weight  kg    3.0
dtype: float64

Dropping missing values

This example is valid syntax, but we were not able to check execution

>>> df_multi_level_cols3 = pd.DataFrame([[None, 1.0], [2.0, 3.0]],
...                                     index=['cat', 'dog'],
...                                     columns=multicol2)

Note that rows where all values are missing are dropped by default but this behaviour can be controlled via the dropna keyword parameter:

This example is valid syntax, but we were not able to check execution

>>> df_multi_level_cols3
    weight height
        kg      m
cat    NaN    1.0
dog    2.0    3.0

This example is valid syntax, but we were not able to check execution

>>> df_multi_level_cols3.stack(dropna=False)
        height  weight
cat kg     NaN     NaN
    m      1.0     NaN
dog kg     NaN     2.0
    m      3.0     NaN

This example is valid syntax, but we were not able to check execution

>>> df_multi_level_cols3.stack(dropna=True)
        height  weight
cat m      1.0     NaN
dog kg     NaN     2.0
    m      3.0     NaN

See :

Back References

The following pages refer to to this document either explicitly or contain code examples using this.

pandas.core.frame.DataFrame.unstack

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them

File: /pandas/core/frame.py#8052
type: <class 'function'>
Commit: