The easiest way to get started with plotting using matplotlib is often by using the MATLAB API that is supported by the package:
The output for the preceding command is as follows:
The preceding example could then be written as follows:
The output for the preceding command is as follows:
The output for the preceding command is as follows:
The default line format when we plot data in matplotlib is a solid blue line, which is abbreviated as b-
. To change this setting, we only need to add the symbol code, which includes letters as color string and symbols as line style string, to the plot
function. Let us consider a plot of several lines with different format styles:
The output for the preceding command is as follows:
The output for the preceding command is as follows:
The following table lists some common properties of the line2d
plotting:
Property |
Value type |
Description |
---|---|---|
|
Any matplotlib color |
This sets the color of the line in the figure |
|
On/off |
This sets the sequence of ink in the points |
|
|
This sets the data used for visualization |
|
[ |
This sets the line style in the figure |
|
Float value in points |
This sets the width of line in the figure |
|
Any symbol |
This sets the style at data points in the figure |
By default, all plotting commands apply to the current figure and axes. In some situations, we want to visualize data in multiple figures and axes to compare different plots or to use the space on a page more efficiently. There are two steps required before we can plot the data. Firstly, we have to define which figure we want to plot. Secondly, we need to figure out the position of our subplot in the figure:
The output for the preceding command is as follows:
The output for the preceding command is as follows:
There is a convenience method, plt.subplots()
, to creating a figure that contains a given number of subplots. As inthe previous example, we can use the plt.subplots(2,2)
command to create a 2x2
figure that consists of four subplots.
We have looked at how to create simple line plots so far. The matplotlib library supports many more plot types that are useful for data visualization. However, our goal is to provide the basic knowledge that will help you to understand and use the library for visualizing data in the most common situations. Therefore, we will only focus on four kinds of plot types: scatter plots, bar plots, contour plots, and histograms.
A scatter plot is used to visualize the relationship between variables measured in the same dataset. It is easy to plot a simple scatter plot, using the plt.scatter()
function, that requires numeric columns for both the x
and y
axis:
A bar plot is used to present grouped data with rectangular bars, which can be either vertical or horizontal, with the lengths of the bars corresponding to their values. We use the plt.bar()
command to visualize a vertical bar, and the plt.barh()
command for the other:
We use contour plots to present the relationship between three numeric variables in two dimensions. Two variables are drawn along the x
and y
axes, and the third variable, z
, is used for contour levels that are plotted as curves in different colors:
Let's take a look at the contour plot in the following image:
scatter plot is used to visualize the relationship between variables measured in the same dataset. It is easy to plot a simple scatter plot, using the plt.scatter()
function, that requires numeric columns for both the x
and y
axis:
A bar plot is used to present grouped data with rectangular bars, which can be either vertical or horizontal, with the lengths of the bars corresponding to their values. We use the plt.bar()
command to visualize a vertical bar, and the plt.barh()
command for the other:
We use contour plots to present the relationship between three numeric variables in two dimensions. Two variables are drawn along the x
and y
axes, and the third variable, z
, is used for contour levels that are plotted as curves in different colors:
Let's take a look at the contour plot in the following image:
bar plot is used to present grouped data with rectangular bars, which can be either vertical or horizontal, with the lengths of the bars corresponding to their values. We use the plt.bar()
command to visualize a vertical bar, and the plt.barh()
command for the other:
We use contour plots to present the relationship between three numeric variables in two dimensions. Two variables are drawn along the x
and y
axes, and the third variable, z
, is used for contour levels that are plotted as curves in different colors:
Let's take a look at the contour plot in the following image:
contour plots to present the relationship between three numeric variables in two dimensions. Two variables are drawn along the x
and y
axes, and the third variable, z
, is used for contour levels that are plotted as curves in different colors:
Let's take a look at the contour plot in the following image:
Legends are an important element that is used to identify the plot
elements in a figure. The easiest way to show a legend inside a figure is to use the label
argument of the plot
function, and show the labels by calling the plt.legend()
method:
The output for the preceding command as follows:
The loc
argument in the legend command is used to figure out the position of the label box. There are several valid location options: lower left
, right
, upper left
, lower center
, upper right
, center
, lower right
, upper right
, center right
, best
, upper center
, and center left
. The default position setting is upper right
. However, when we set an invalid location option that does not exist in the above list, the function automatically falls back to the best option.
The output for the preceding command is as follows:
The other element in a figure that we want to introduce is the annotations which can consist of text, arrows, or other shapes to explain parts of the figure in detail, or to emphasize some special data points. There are different methods for showing annotations, such as text
, arrow
, and
annotation
.
- The
text
method draws text at the given coordinates(x, y)
on the plot; optionally with custom properties. There are some common arguments in the function:x
,y
, label text, and font-related properties that can be passed in viafontdict
, such asfamily
,fontsize
, andstyle
. - The
annotate
method can draw both text and arrows arranged appropriately. Arguments of this function ares
(label text),xy
(the position of element to annotation),xytext
(the position of the labels
),xycoords
(the string that indicates what type of coordinatexy
is), andarrowprops
(the dictionary of line properties for the arrow that connects the annotation).
Here is a simple example to illustrate the annotate
and text
functions:
We have covered most of the important components in a plot figure using matplotlib. In this section, we will introduce another powerful plotting method for directly creating standard visualization from Pandas data objects that are often used to manipulate data.
The output for the preceding command is as follows:
Another example will visualize the data of a DataFrame object consisting of multiple columns:
The output for the preceding command is as follows:
The plot method of the DataFrame has a number of options that allow us to handle the plotting of the columns. For example, in the above DataFrame visualization, we chose to plot the columns in separate subplots. The following table lists more options:
Argument |
Value |
Description |
---|---|---|
|
|
The plots each data column in a separate subplot |
|
|
The gets a log-scale |
|
|
The plots data on a secondary |
|
|
The shares the same |
Besides matplotlib, there are other powerful data visualization toolkits based on Python. While we cannot dive deeper into these libraries, we would like to at least briefly introduce them in this session.
Bokeh is a project by Peter Wang, Hugo Shi, and others at Continuum Analytics. It aims to provide elegant and engaging visualizations in the style of D3.js
. The library can quickly and easily create interactive plots, dashboards, and data applications. Here are a few differences between matplotlib and Bokeh:
- Bokeh achieves cross-platform ubiquity through IPython's new model of in-browser client-side rendering
- Bokeh uses a syntax familiar to R and ggplot users, while matplotlib is more familiar to Matlab users
- Bokeh has a coherent vision to build a ggplot-inspired in-browser interactive visualization tool, while Matplotlib has a coherent vision of focusing on 2D cross-platform graphics.
The basic steps for creating plots with Bokeh are as follows:
- Prepare some data in a list, series, and Dataframe
- Tell Bokeh where you want to generate the output
- Call
figure()
to create a plot with some overall options, similar to the matplotlib options discussed earlier - Add renderers for your data, with visual customizations such as colors, legends, and width
- Ask Bokeh to
show()
orsave()
the results
MayaVi is a library for interactive scientific data visualization and 3D plotting, built on top of the award-winning visualization toolkit (VTK), which is a traits-based wrapper for the open-source visualization library. It offers the following:
- The possibility to interact with the data and object in the visualization through dialogs.
- An interface in Python for scripting. MayaVi can work with Numpy and scipy for 3D plotting out of the box and can be used within IPython notebooks, which is similar to matplotlib.
- An abstraction over VTK that offers a simpler programming model.
Let's view an illustration made entirely using MayaVi based on VTK examples and their provided data:
a project by Peter Wang, Hugo Shi, and others at Continuum Analytics. It aims to provide elegant and engaging visualizations in the style of D3.js
. The library can quickly and easily create interactive plots, dashboards, and data applications. Here are a few differences between matplotlib and Bokeh:
- Bokeh achieves cross-platform ubiquity through IPython's new model of in-browser client-side rendering
- Bokeh uses a syntax familiar to R and ggplot users, while matplotlib is more familiar to Matlab users
- Bokeh has a coherent vision to build a ggplot-inspired in-browser interactive visualization tool, while Matplotlib has a coherent vision of focusing on 2D cross-platform graphics.
The basic steps for creating plots with Bokeh are as follows:
- Prepare some data in a list, series, and Dataframe
- Tell Bokeh where you want to generate the output
- Call
figure()
to create a plot with some overall options, similar to the matplotlib options discussed earlier - Add renderers for your data, with visual customizations such as colors, legends, and width
- Ask Bokeh to
show()
orsave()
the results
MayaVi is a library for interactive scientific data visualization and 3D plotting, built on top of the award-winning visualization toolkit (VTK), which is a traits-based wrapper for the open-source visualization library. It offers the following:
- The possibility to interact with the data and object in the visualization through dialogs.
- An interface in Python for scripting. MayaVi can work with Numpy and scipy for 3D plotting out of the box and can be used within IPython notebooks, which is similar to matplotlib.
- An abstraction over VTK that offers a simpler programming model.
Let's view an illustration made entirely using MayaVi based on VTK examples and their provided data:
is a library for interactive scientific data visualization and 3D plotting, built on top of the award-winning visualization toolkit (VTK), which is a traits-based wrapper for the open-source visualization library. It offers the following:
- The possibility to interact with the data and object in the visualization through dialogs.
- An interface in Python for scripting. MayaVi can work with Numpy and scipy for 3D plotting out of the box and can be used within IPython notebooks, which is similar to matplotlib.
- An abstraction over VTK that offers a simpler programming model.
Let's view an illustration made entirely using MayaVi based on VTK examples and their provided data:
- Name two real or fictional datasets and explain which kind of plot would best fit the data: line plots, bar charts, scatter plots, contour plots, or histograms. Name one or two applications, where each of the plot type is common (for example, histograms are often used in image editing applications).
- We only focused on the most common plot types of matplotlib. After a bit of research, can you name a few more plot types that are available in matplotlib?
- Take one Pandas data structure from Chapter 3, Data Analysis with Pandas and plot the data in a suitable way. Then, save it as a PNG image to the disk.