Index
A
- abscissa axis
- about / Evaluating the fitted values
- accuracy / Assessing the classifier's performance
- active learning
- AdaBoostRegressor class
- about / Boosting
- Akaike Information Criterion (AIC)
- about / The coefficient of determination
- Anaconda
- URL / Scientific distributions
- Area under the curve (AUC) / The Madelon dataset
B
- bagging
- about / Bagging and boosting, Bagging
- BaggingRegressor class
- about / Bagging
- bag of words (BoW) / Feature hasher
- batches
- batch learning
- about / Batch learning
- Bayesian Information Criterion (BIC)
- about / The coefficient of determination
- Bayesian regression
- about / Bayesian regression
- pros / Bayesian regression wrap up
- cons / Bayesian regression wrap up
- comparison, with logistic regression / Comparison with logistic regression
- SVR / SVR
- bias
- about / Extending to linear regression
- binary classification
- binning
- about / Summarizations by binning
- boosting
- about / Bagging and boosting, Boosting
- Bootstrap Aggregating
- about / Bagging
- bootstrapping
- about / Bootstrapping
- Boston housing dataset / Preparing to discover simple linear regression
C
- California housing dataset / Preparing to discover simple linear regression
- car dataset / A ranking problem
- causation
- about / Correlation is not causation
- direct causation / Correlation is not causation
- reciprocal effects / Correlation is not causation
- spurious causation / Correlation is not causation
- indirect causation / Correlation is not causation
- conditional effect / Correlation is not causation
- random effect / Correlation is not causation
- Classification and Regression Tree (CART)
- about / Regression trees (CART)
- pros / Regression tree wrap up
- cons / Regression tree wrap up
- classification function
- classification problem
- defining / Defining a classification problem
- binary classification / Formalization of the problem: binary classification
- classifier's performance, assessing / Assessing the classifier's performance
- coefficients
- collinearity
- about / The correlation matrix
- confidence interval / Meaning and significance of coefficients
- confusion matrix / Assessing the classifier's performance
- correlation
- about / A measure of linear relationship
- correlation matrix
- about / The correlation matrix
- cost function
- minimizing / Minimizing the cost function
- squared errors, using / Explaining the reason for using squared errors
- Pseudoinverse, using / Pseudoinverse and other optimization methods
- Gradient Descent / Gradient descent at work
- covariance / A measure of linear relationship
- cross-validation
- about / Cross-validation
- cubic transformation
- about / Polynomial regression
- versus linear transformation, testing / Testing linear versus cubic transformation
D
- data science
- about / Regression analysis and data science
- promises / Exploring the promise of data science
- challenges / The challenge
- linear models / The linear models
- with Python / Python for data science
- dataset
- datasets
- downloading / Downloading the datasets
- time series problem dataset / Time series problem dataset
- regression problem dataset / Regression problem dataset
- multiclass classification problem dataset / Multiclass classification problem dataset
- ranking problem dataset / Ranking problem dataset
- decision boundary / Let's see some code
- dichotomies
- about / Qualitative feature encoding
- Dictionary vectorization encoding / DictVectorizer and one-hot encoding
- DictVectorizer
- Dow Jones Index Stocks dataset / Time series problem dataset
- Dummy encoding / Dummy coding with Pandas
- Durbin-Watson test / Evaluating the fitted values
E
- elastic net
- about / Elastic net
- ensemble / Bagging and boosting
- pros / Ensemble wrap up
- cons / Ensemble wrap up
- extrapolation
F
- F-statistic / The coefficient of determination
- f1 score / Assessing the classifier's performance
- feature creation phase
- about / Checking on out-of-sample data
- feature hasher
- about / Feature hasher
- feature scaling
- about / Feature scaling
- feature selection
- about / Greedy selection of features
- Madelon dataset / The Madelon dataset
- univariate feature selection / Univariate selection of features
- recursive feature selection / Recursive feature selection
- Forward Selection algorithm
- about / Least Angle Regression
- Forward Stagewise Regression algorithm
- about / Least Angle Regression
- future utils / Reflecting on predictive variables
G
- GBM, with LAD
- about / GBM with LAD wrap up
- generalized linear model (GLM)
- about / The family of linear models
- get-pi.py script
- URL / Installing packages
- Google Books
- URL / The linear models
- Gradient Boosting Regressor
- with LAD / Gradient Boosting Regressor with LAD
- Gradient Descend
- using / Revisiting gradient descent
- Gradient Descent
- about / Minimizing the cost function
- using / Gradient descent at work
- gradient descent
- about / Revisiting gradient descent
- feature scaling / Feature scaling
- coefficients, estimating / Unstandardizing coefficients
- Greedy feature selection / Greedy selection of features
- grid search
- for optimal parameters / Grid search for optimal parameters
H
- hashing trick
- about / Feature hasher
- hinge / SGD classification with hinge loss
I
- imbalanced and multiclass classification problem / An imbalanced and multiclass classification problem
- interaction models
- about / Interaction models
- interactions, discovering / Discovering interactions
- intercept
- interpolation
- interquartile range (IQR)
- about / Outliers
- IPython
- about / Introducing Jupyter or IPython
- IPython Notebook Viewer
J
- Julia
- Jupyter
K
- Kddcup99 dataset / Multiclass classification problem dataset
- kernels, IPython
- KFold cross validation / Cross-validation
- Kurtosis / Evaluating the fitted values
L
- label ranking average precision (LRAP) / A ranking problem
- LARS algorithm
- about / Least Angle Regression
- lasso (L1 regularization)
- about / Lasso (L1 regularization)
- learning rate
- about / Gradient descent at work
- Least Absolute Deviations (LAD)
- Least Angle Regression (LARS)
- about / Least Angle Regression
- visual showcase / Visual showcase of LARS
- code example / A code example
- pros / LARS wrap up
- cons / LARS wrap up
- linear model family
- about / The family of linear models
- linear models
- about / The linear models, Linear models and supervised learning
- Python packages / Python packages and functions for linear models
- linear regression
- about / Extending to linear regression
- with StatsModels / Regressing with Statsmodels
- coefficient of determination / The coefficient of determination
- coefficients / Meaning and significance of coefficients
- fitted values, evaluating / Evaluating the fitted values
- causation / Correlation is not causation
- predicting, with regression model / Predicting with a regression model
- with Scikit-learn / Regressing with Scikit-learn
- linear transformation
- versus cubic transformation, testing / Testing linear versus cubic transformation
- logistic function
- logistic regression
- logit function
M
- Madelon dataset
- URL / The Madelon dataset
- Markdown language
- Mean Absolute Error (MAE)
- about / A regression problem
- median
- about / Missing data imputation
- mini-batch learning / Online mini-batch learning
- MinMaxScaler class / Mean centering
- missing data
- managing / Missing data
- imputation / Missing data imputation
- missing values, tracking / Keeping track of missing values
- mode
- about / Missing data imputation
- model evaluation
- reference link / Cross-validation
- Multiclass Logistic Regression
- about / Multiclass Logistic Regression
- example / An example
- multiple regression
- using / Using multiple features
- model, building with Statsmodels / Model building with Statsmodels
- formulas, using / Using formulas as an alternative
- correlation matrix / The correlation matrix
- feature importance, estimating / Estimating feature importance
- standardized coefficients, inspecting / Inspecting standardized coefficients
- models, comparing by R-squared / Comparing models by R-squared
N
- natural language processing (NLP)
- about / Feature hasher
- no free lunch / The challenge
- normal distribution
- about / Starting from the basics
- normalization
- about / Normalization
- Not a Number (NaN)
- about / Missing data
- numeric feature
- scaling / Numeric feature scaling
- mean, centering / Mean centering
- standardization / Standardization
- normalization / Normalization
- logistic regression example / The logistic regression case
- transforming / Numeric feature transformation
- residuals, observing / Observing residuals
- summarizations, by binning / Summarizations by binning
- NumPy
O
- Occam's razor / The challenge
- odds ratio / The logistic regression case
- Omnibus D'Angostino's test / Evaluating the fitted values
- one-hot encoding
- One-vs-all algorithm
- about / Multiclass Logistic Regression
- One-vs-rest algorithm
- about / Multiclass Logistic Regression
- online learning / Online mini-batch learning
- online mini-batch learning
- about / Online mini-batch learning
- example / A real example
- best practices / Streaming scenario without a test set
- ordinal variables
- out-of-sample data
- checking / Checking on out-of-sample data
- testing / Testing by sample split
- cross-validation, performing / Cross-validation
- bootstrapping, performing / Bootstrapping
- outliers
- about / Outliers
- response variable, checking / Outliers on the response
- predictors, checking / Outliers among the predictors
- replacing / Removing or replacing outliers
- removing / Removing or replacing outliers
- overfitting
P
- Pandas
- qualitative feature, coding with / Dummy coding with Pandas
- Pandas DataFrame / Starting from the basics
- partial residual plot
- about / Observing residuals
- Patsy package
- pip
- installing / Installing packages
- URL / Installing packages
- polynomial regression
- about / Polynomial regression
- linear versus cubic transformation, testing / Testing linear versus cubic transformation
- higher-degree solutions / Going for higher-degree solutions
- underfitting / Introducing underfitting and overfitting
- overfitting / Introducing underfitting and overfitting
- precision score / Assessing the classifier's performance
- prediction
- predictive variables
- Principal Component Analysis (PCA)
- about / Outliers among the predictors
- probability-based approach
- defining / Defining a probability-based approach
- logistic function / More on the logistic and logit functions
- logit function / More on the logistic and logit functions
- computing / Let's see some code
- probability density function (PDF)
- about / Starting from the basics
- Pseudoinverse
- Python
- used, for data science / Python for data science
- installing / Installing Python
- URL / Step-by-step installation
- installation / Step-by-step installation
- packages, installing / Installing packages
- packages, upgrading / Package upgrades
- scientific distributions / Scientific distributions
- Python(x,y)
- URL / Scientific distributions
- Python 2
- versus Python 3 / Choosing between Python 2 and Python 3
- Python 3
- Python packages
- for linear models / Python packages and functions for linear models
- NumPy / NumPy
- SciPy / SciPy
- Statsmodels / Statsmodels
- Scikit-learn / Scikit-learn
Q
- QR factorization
- about / Minimizing the cost function
- quadratic transformation
- about / Polynomial regression
- qualitative feature
- encoding / Qualitative feature encoding
- coding, with Pandas / Dummy coding with Pandas
- DictVectorizer / DictVectorizer and one-hot encoding
- one-hot encoding / DictVectorizer and one-hot encoding
- feature hasher / Feature hasher
R
- R
- R-squared
- used, for comparing models / Comparing models by R-squared
- random grid search
- about / Random grid search
- reference link / Random grid search
- ranking problem
- about / A ranking problem
- recursive feature selection
- about / Recursive feature selection
- regression analysis
- regression problem
- defining / Defining a regression problem
- linear models / Linear models and supervised learning
- supervised learning / Linear models and supervised learning
- simple linear regression, discovering / Preparing to discover simple linear regression
- about / A regression problem
- classifier, testing instead of regressor / Testing a classifier instead of a regressor
- regularization
- about / Regularization optimized by grid-search
- ridge (L2 regularization) / Ridge (L2 regularization)
- grid search, for optimal parameters / Grid search for optimal parameters
- random grid search / Random grid search
- lasso (L1 regularization) / Lasso (L1 regularization)
- elastic net / Elastic net
- reinforcement learning
- about / Defining a regression problem
- residuals
- observing / Observing residuals
- residuals, linear regression
- Skewness / Evaluating the fitted values
- Kurtosis / Evaluating the fitted values
- Omnibus D'Angostino's test / Evaluating the fitted values
- Prob(Omnibus) / Evaluating the fitted values
- Jarque-Bera / Evaluating the fitted values
- Prob (JB) / Evaluating the fitted values
- Durbin-Watson / Evaluating the fitted values
- Cond. No / Evaluating the fitted values
- response variables
- about / Reflecting on response variables
- reversals
- about / Estimating feature importance
- ridge (L2 regularization)
- about / Ridge (L2 regularization)
- root mean squared error / Greedy selection of features
S
- Scala
- scatterplot
- about / A measure of linear relationship
- Scatterplot (matplotlib) / A measure of linear relationship
- scikit-learn
- URL / A ranking problem
- Scikit-learn
- about / Scikit-learn, Regressing with Scikit-learn
- URL / Scikit-learn
- SciPy
- SciPy Toolkits (SciKits)
- about / Scikit-learn
- setup tool, Python
- URL / Installing packages
- SGD classification
- with hinge loss / SGD classification with hinge loss
- Sigma function / Defining a probability-based approach
- simple linear regression
- about / The family of linear models
- discovering / Preparing to discover simple linear regression
- Boston dataset, exploring / Starting from the basics
- linear relationship, measuring / A measure of linear relationship
- snooping / Cross-validation, The Madelon dataset
- softmax / SGD classification with hinge loss
- squared errors
- squared sum of errors
- about / Starting from the basics
- stability selection
- about / Stability selection
- experimenting, with Madelon dataset / Experimenting with the Madelon
- standardization
- about / Standardization
- standardized betas / Inspecting standardized coefficients
- standardized coefficients
- inspecting / Inspecting standardized coefficients
- StandardScaler class / Mean centering
- Standard scaling / Mean centering
- StatsModels
- about / Regressing with Statsmodels
- statsmodels.api method / Regressing with Statsmodels
- statsmodels.formula.api method / Regressing with Statsmodels
- Statsmodels
- about / Statsmodels
- URL / Statsmodels
- used, for building model / Model building with Statsmodels
- stochastic gradient descent (SGD)
- about / Batch learning
- Streaming-scenario learning / Streaming scenario without a test set
- summarizations
- by binning / Summarizations by binning
- sum of squared errors (SSE)
- about / Outliers among the predictors
- supervised learning
- about / Defining a regression problem, Linear models and supervised learning
- predictive variables / Reflecting on predictive variables
- response variables / Reflecting on response variables
- Support Vector Machine (SVM) / SGD classification with hinge loss
- Support Vector Regressor (SVR) / SVR
- SVM
- about / SVM wrap up
- pros / SVM wrap up
- cons / SVM wrap up
T
- t-statistic / Meaning and significance of coefficients
- The Million Song Dataset / Regression problem dataset
- time series problem
- about / A time series problem
U
- underfitting
- univariate feature selection
- about / Univariate selection of features
- unstandardized betas / Unstandardizing coefficients
- unsupervised learning
- about / Defining a regression problem
V
- variable / Reflecting on predictive variables
W
- weak learner
- about / Gradient descent at work
- WinPython
- URL / Scientific distributions
Z
- Z score / A measure of linear relationship