A picture is worth a thousand words.
We all know that images are a powerful form of communication. We often use them to understand a situation better or to condense pieces of information into a graphical representation.
Just to give a couple of examples on how helpful they can be, let's consider the scientific and performance analysis fields. In order to clearly identify the bottlenecks, it is very important to be able to visualize data when analyzing performance information. Similarly, taking a quick glance at a graph drawn for a scientific experiment can give a scientist a better understanding of the results, something which is harder to achieve by looking only at the raw data.
Python is an interpreted language with a strong core functions basis and a powerful modular aspect which allows us to expand the language with external modules that offer new functionalities.
Modules reflect the Unix philosophy:
Do one thing, do it well.
So the result is that we have an extensible language with tools to accomplish a single task in the best possible way. Modules are often organized in packages. A package is a structured collection of modules that have the same purpose. One example of a package is Matplotlib.
Matplotlib is a Python package for 2D plotting that generates production-quality graphs. It supports interactive and non-interactive plotting, and can save images in several output formats (PNG, PS, and others). It can use multiple window toolkits (GTK+, wxWidgets, Qt, and so on) and it provides a wide variety of plot types (lines, bars, pie charts, histograms, and many more). In addition to this, it is highly customizable, flexible, and easy to use.
The dual nature of Matplotlib allows it to be used in both interactive and non-interactive scripts. It can be used in scripts without a graphical display, embedded in graphical applications, or on web pages. It can also be used interactively with the Python interpreter or IPython.
In this chapter, we will introduce Matplotlib, learn what it is, and what it can do. Later on, we will see what tools and Python modules are needed to have the best experience with Matplotlib and how to get them installed on our system, be it Linux, Windows, or Mac OS X.
The topics we are going to cover are:
Introduction to Matplotlib
Output formats and backends
Dependencies
How to install Matplotlib
The idea behind Matplotlib can be summed up in the following motto as quoted by John Hunter, the creator and project leader of Matplotlib:
Matplotlib tries to make easy things easy and hard things possible.
We can generate high quality, publication-ready graphs with minimal effort (sometimes we can achieve this with just one line of code or so), and for elaborate graphs, we have at hand a powerful library to support our needs.
Matplotlib was born in the scientific area of computing, where gnuplot and MATLAB were (and still are) used a lot.
With the entrance of Python into scientific toolboxes, an example of a workflow to process some data might be similar to this: "Write a Python script to parse data, then pass the data to a gnuplot script to plot it". Now with Matplotlib, we can write a single script to parse and plot data, with a lot more flexibility (that gnuplot doesn't have) and consistently using the same programming language.
We have to think of plotting not just as the final step in working with our data, but as an important way of getting visual feedback during the process. Here, the interactive capabilities of Matplotlib will come and rescue us.
Matplotlib was modeled on MATLAB, because graphing was something that MATLAB did very well. The high degree of compatibility between them made many people move from MATLAB to Matplotlib, as they felt like home while working with Matplotlib.
But what are the points that built the success of Matplotlib? Let's look at some of them:
It uses Python: Python is a very interesting language for scientific purposes (it's interpreted, high-level, easy to learn, easily extensible, and has a powerful standard library) and is now used by major institutions such as NASA, JPL, Google, DreamWorks, Disney, and many more.
It's open source, so no license to pay: This makes it very appealing for professors and students, who often have a low budget.
It's a real programming language: The MATLAB language (while being Turing-complete) lacks many of the features of a general-purpose language like Python.
It's much more complete: Python has a lot of external modules that will help us perform all the functions we need to. So it's the perfect tool to acquire data, elaborate the data, and then plot the data.
It's very customizable and extensible: Matplotlib can fit every use case because it has a lot of graph types, features, and configuration options.
It's integrated with LaTeX markup: This is really useful when writing scientific papers.
It's cross-platform and portable: Matplotlib can run on Linux, Windows, Mac OS X, and Sun Solaris (and Python can run on almost every architecture available).
In short, Python became very common in the scientific field, and this success is reflected even on this book, where we'll find some mathematical formulas. But don't be concerned about that, we will use nothing more complex than high school level equations.