A scatter plot matrix can be formed for a collection of variables where each of the variables will be plotted against each other. The following code generates a DataFrame df
, which consists of four columns with normally distributed random values and column names named from a
to d
:
>>> df = pd.DataFrame(np.random.randn(1000, 4), columns=['a', 'b', 'c', 'd']) >>> spm = pd.tools.plotting.scatter_matrix(df, alpha=0.2, figsize=(6, 6), diagonal='hist')
After the preceding code is executed we'll get the following output:
The scatter_matrix()
function helps in plotting the preceding figure. It takes in the data frame object and the required parameters that are defined to customize the plot. You would have observed that the diagonal graph is defined as a histogram, which means that in the section of the plot matrix where the variable is against itself, a histogram is plotted.
Instead of the histogram, we can also use the kernel density estimation for the diagonal...