Up until now, we've been looking at a very simple set of data. Next, we'll be generating a much more complicated example. To model it, we'll be applying the techniques from the last chapter to build a solid model using TDD.
Unlike the last time, let's build the data generation code first, and use it so that it can help us understand our model building process more deeply. Here is the data generator that we'll use for the remainder of this chapter:
import pandas import statsmodels.formula.api as smf import numpy as np def generate_data(): observation_count = 1000 intercept = -1.6 beta1 = -0.03 beta2 = 0.1 beta3 = -0.15 variable_a = np.random.uniform(0, 100, size=observation_count) variable_b = np.random.uniform(50, 75, size=observation_count) variable_c = np.random.uniform(3, 10, size=observation_count) variable_d = np.random.uniform(3, 10, size=observation_count) variable_e = np.random.uniform(11, 87, size=observation_count...