Anscombe's quartet is a classic example that illustrates why visualizing data is important. The quartet consists of four datasets with similar statistical properties. Each dataset has a series of x values and dependent y values. We will tabulate these metrics in an IPython notebook. However, if you plot the datasets, they look surprisingly different compared to each other.
For this recipe, you need to perform the following steps:
Start with the following imports:
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import matplotlib as mpl from dautil import report from dautil import plotting import numpy as np from tabulate import tabulate
Define the following function to compute the mean, variance, and correlation of
x
andy
within a dataset, the slope, and the intercept of a linear fit for each of the datasets:df = sns.load_dataset("anscombe") agg = df.groupby('dataset')\ .agg([np.mean, np.var])\ ...