crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins: 'bool' = False, margins_name: 'str' = 'All', dropna: 'bool' = True, normalize=False) -> 'DataFrame'
Any Series passed will have their name attributes used unless row or column names for the cross-tabulation are specified.
Any input passed containing Categorical data will have all of its categories included in the cross-tabulation, even if the actual data does not contain any instances of a particular category.
In the event that there aren't overlapping indexes an empty DataFrame will be returned.
Reference the user guide <reshaping.crosstabulations>
for more examples.
Values to group by in the rows.
Values to group by in the columns.
Array of values to aggregate according to the factors. Requires :None:None:`aggfunc`
be specified.
If passed, must match number of row arrays passed.
If passed, must match number of column arrays passed.
If specified, requires :None:None:`values`
be specified as well.
Add row/column margins (subtotals).
Name of the row/column that will contain the totals when margins is True.
Do not include columns whose entries are all NaN.
Normalize by dividing all values by the sum of values.
If passed 'all' or :None:None:`True`
, will normalize over all values.
If passed 'index' will normalize over each row.
If passed 'columns' will normalize over each column.
If margins is :None:None:`True`
, will also normalize margin values.
Cross tabulation of the data.
Compute a simple cross tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed.
DataFrame.pivot
Reshape data based on column values.
pivot_table
Create a pivot table as a DataFrame.
>>> a = np.array(["foo", "foo", "foo", "foo", "bar", "bar",
... "bar", "bar", "foo", "foo", "foo"], dtype=object)
... b = np.array(["one", "one", "one", "two", "one", "one",
... "one", "two", "two", "two", "one"], dtype=object)
... c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny",
... "shiny", "dull", "shiny", "shiny", "shiny"],
... dtype=object)
... pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c']) b one two c dull shiny dull shiny a bar 1 2 1 0 foo 2 2 1 2
Here 'c' and 'f' are not represented in the data and will not be shown in the output because dropna is True by default. Set dropna=False to preserve categories with no data.
This example is valid syntax, but we were not able to check execution>>> foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'])This example is valid syntax, but we were not able to check execution
... bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f'])
... pd.crosstab(foo, bar) col_0 d e row_0 a 1 0 b 0 1
>>> pd.crosstab(foo, bar, dropna=False) col_0 d e f row_0 a 1 0 0 b 0 1 0 c 0 0 0See :
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them