Numeric features can be transformed, regardless of the target variable. This is often a prerequisite for better performance of certain classifiers, particularly distance-based. We usually avoid ( besides specific cases such as when modeling a percentage or distributions with long queues) transforming the target, since we will make any pre-existent linear relationship between the target and other features non-linear.
We will keep on working on the Boston Housing dataset:
In: import numpy as np boston = load_boston() labels = boston.feature_names X = boston.data y = boston.target print (boston.feature_names) Out: ['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' \'RAD' 'TAX' 'PTRATIO' 'B' 'LSTAT']
As before, we fit the model using LinearRegression
from Scikit-learn, this time measuring its R-squared value using the r2_score
function from the metrics
module:
In: linear_regression = \linear_model.LinearRegression(fit_intercept=True) linear_regression...