boxplot(data, column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, figsize=None, layout=None, return_type=None, **kwargs)
Make a box-and-whisker plot from DataFrame columns, optionally grouped by some other columns. A box plot is a method for graphically depicting groups of numerical data through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data. By default, they extend no more than :None:None:`1.5 * IQR (IQR = Q3 - Q1)`
from the edges of the box, ending at the farthest data point within that interval. Outliers are plotted as separate dots.
For further details see Wikipedia's entry for boxplot.
The return type depends on the :None:None:`return_type`
parameter:
Column name or list of names, or vector. Can be any valid input to pandas.DataFrame.groupby
.
Column in the DataFrame to pandas.DataFrame.groupby
. One box-plot will be done per value of columns in :None:None:`by`
.
The matplotlib axes to be used by boxplot.
Tick label font size in points or as a string (e.g., :None:None:`large`
).
The rotation angle of labels (in degrees) with respect to the screen coordinate system.
Setting this to True will show the grid.
The size of the figure to create in matplotlib.
For example, (3, 5) will display the subplots using 3 columns and 5 rows, starting from the top-left.
The kind of object to return. The default is axes
.
'axes' returns the matplotlib axes the boxplot is drawn on.
'dict' returns a dictionary whose values are the matplotlib Lines of the boxplot.
'both' returns a namedtuple with the axes and dict.
when grouping with by
, a Series mapping columns to return_type
is returned.
If return_type
is :None:None:`None`
, a NumPy array of axes with the same shape as layout
is returned.
All other plotting keyword arguments to be passed to matplotlib.pyplot.boxplot
.
See Notes.
Make a box plot from DataFrame columns.
Series.plot.hist
Make a histogram.
matplotlib.pyplot.boxplot
Matplotlib equivalent plot.
Boxplots can be created for every column in the dataframe by df.boxplot()
or indicating the columns to be used:
.. plot:: ('context', 'close-figs')
>>> np.random.seed(1234) >>> df = pd.DataFrame(np.random.randn(10, 4), ... columns=['Col1', 'Col2', 'Col3', 'Col4']) >>> boxplot = df.boxplot(column=['Col1', 'Col2', 'Col3']) # doctest: +SKIP
Boxplots of variables distributions grouped by the values of a third variable can be created using the option by
. For instance:
.. plot:: ('context', 'close-figs')
>>> df = pd.DataFrame(np.random.randn(10, 2), ... columns=['Col1', 'Col2']) >>> df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A', ... 'B', 'B', 'B', 'B', 'B']) >>> boxplot = df.boxplot(by='X')
A list of strings (i.e. ['X', 'Y']
) can be passed to boxplot in order to group the data by combination of the variables in the x-axis:
.. plot:: ('context', 'close-figs')
>>> df = pd.DataFrame(np.random.randn(10, 3), ... columns=['Col1', 'Col2', 'Col3']) >>> df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A', ... 'B', 'B', 'B', 'B', 'B']) >>> df['Y'] = pd.Series(['A', 'B', 'A', 'B', 'A', ... 'B', 'A', 'B', 'A', 'B']) >>> boxplot = df.boxplot(column=['Col1', 'Col2'], by=['X', 'Y'])
The layout of boxplot can be adjusted giving a tuple to layout
:
.. plot:: ('context', 'close-figs')
>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X', ... layout=(2, 1))
Additional formatting can be done to the boxplot, like suppressing the grid ( grid=False
), rotating the labels in the x-axis (i.e. rot=45
) or changing the fontsize (i.e. fontsize=15
):
.. plot:: ('context', 'close-figs')
>>> boxplot = df.boxplot(grid=False, rot=45, fontsize=15) # doctest: +SKIP
The parameter return_type
can be used to select the type of element returned by boxplot
. When return_type='axes'
is selected, the matplotlib axes on which the boxplot is drawn are returned:
>>> boxplot = df.boxplot(column=['Col1', 'Col2'], return_type='axes') >>> type(boxplot) <class 'matplotlib.axes._subplots.AxesSubplot'>
When grouping with by
, a Series mapping columns to return_type
is returned:
>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X', ... return_type='axes') >>> type(boxplot) <class 'pandas.core.series.Series'>
If return_type
is :None:None:`None`
, a NumPy array of axes with the same shape as layout
is returned:
See :>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X', ... return_type=None) >>> type(boxplot) <class 'numpy.ndarray'>
The following pages refer to to this document either explicitly or contain code examples using this.
pandas.plotting
pandas.plotting._core.boxplot
pandas.plotting._core.boxplot_frame
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them