Index
A
- abstraction / Structured features
- adaptive boosting (AdaBoost) / Adaboost
- agglomerative / Discretization
- Anaconda
- reference link / Installing the SciPy stack
- anomaly detection / Dimensionality reduction
- approaches, data volume issue
- efficiency / Data volume
- scalability / Data volume
- parallelism / Data volume
- approaches, machine learning problem
- exploratory / Types of questions
- descriptive / Types of questions
- inferential / Types of questions
- predictive / Types of questions
- casual / Types of questions
- mechanistic / Types of questions
- approaches, Sklearn
- estimator score / Evaluating model performance
- scoring parameters / Evaluating model performance
- metric functions / Evaluating model performance
- association rule learning / Set-based rule models
- association rule mining / Set-based rule models
- audioSamp.wav file
- download link / Data from sound
- averaging method / Ensemble types
B
- bagging
- about / Bagging
- random forests / Random forests
- extra trees / Extra trees
- batch gradient descent / Gradient descent
- Bayes classifier / Operations and statistics
- beam search / Set-based rule models
- bias feature / Gradient descent
- Big Data
- about / Big data
- challenges / Challenges of big data
- data models / Data models
- data distributions / Data distributions
- data, from distributions / Data distributions
- data, from databases / Data from databases
- data, obtaining from Web / Data from the Web
- data, obtaining from natural language / Data from natural language
- data, obtaining from images / Data from images
- data, obtaining from application programming interfaces / Data from application programming interfaces
- bin / Discretization
- binarization / Transforming features
- binary splits / Features
- Boolean feature / Categorical features
- boosting
- about / Boosting
- Adaboost (adaptive boosting ) / Adaboost
- gradient boosting / Gradient boosting
- boosting method / Ensemble types
- broadcasting / Mathematical operations
- broad settings, tasks
- bucketing / Other methods
C
- calibration / Calibration
- Canopy
- reference link / Installing the SciPy stack
- categorical features / Categorical features
- central moment / Operations and statistics
- challenges, Big Data
- about / Challenges of big data
- data volume / Data volume
- data velocity / Data velocity
- data variety / Data variety
- classification error / Purity
- closed form solution / Regularization
- collaborative filtering approach / Building a recommender system, Collaborative filtering
- computational complexity / PAC learning and computational complexity
- conjunctively separable / Version space
- content based filtering approach / Content-based filtering
- corpora / Data from natural language
- cost function
- about / Cost function
- minimizing / Minimizing the cost function
- cost function, logistic regression / The Cost function for logistic regression
- coverage space / Coverage space
- Cumulative Density Function (CDF) / Data distributions
D
- data
- reference link / Pandas
- about / What is data?
- models / Data models
- distributions / Data distributions
- obtaining, from databases / Data from databases
- obtaining, from Web / Data from the Web
- obtaining, from natural language / Data from natural language
- obtaining, from images / Data from images
- obtaining, from application programming interfaces / Data from application programming interfaces
- cleaning / Cleaning data
- visualizing / Visualizing data
- databases
- used, for solving issues / Data models
- data models
- about / Data models
- data models, components
- structure / Data models
- constraints / Data models
- operations / Data models
- decision boundary / Logistic regression
- deep architecture / Other neural net architectures
- descriptive models / Set-based rule models
- design principles
- about / Design principles
- question, types / Types of questions
- right question, asking / Are you asking the right question?
- tasks / Tasks
- unified modelling language (UML) / Unified modeling language
- discretization / Discretization
- distro / Installing the SciPy stack
- divisive / Discretization
- downhill simplex algorithm / SciPy
E
- ElasticNet / Gradient descent
- ensemble
- types / Ensemble types
- strategies / Ensemble strategies
- techniques / Other methods
- ensemble techniques
- about / Ensemble types
- averaging method / Ensemble types
- boosting method / Ensemble types
- entering variable / Linear programming
- equal frequency discretization / Discretization
- equal width discretization / Discretization
- extra trees
- about / Extra trees
- ExtraTreesClassifier class / Extra trees
- ExtraTreesRegressor class / Extra trees
F
- feature
- types / Feature types
- transforming / Transforming features
- discretization / Discretization
- normalization / Normalization
- calibration / Calibration
- feature types
- quantitative features / Quantitative features
- ordinal features / Ordinal features
- categorical features / Categorical features
- Fourier Transform / Signals
- functions, for impurity measures
G
- global / Gradient descent
- gradient boosting / Gradient boosting
- gradient checking / Gradient checking
H
- Hadoop / Data volume
- hamming distance / Ordinal features
- human interface
- about / The human interface
- hyper parameter / Gradient descent
- hypothesis space / Logical models
I
- imputation / Calibration
- inductive logic programming
- about / Structured features
- instance space / Logical models
- integrated pest management systems greenhouses
- about / Insect detection in greenhouses
- reviewing / Reviewing the case study
- internal disjunction / Version space
- IPython console
- about / IPython console
- item-based collaborative filtering / Collaborative filtering
J
- Jupyter
- reference link / IPython console
K
- k-fold cross validation / Evaluating model performance
- kernel trick / Features
- k nearest neighbours (K-NN)
- about / Scikit-learn
- Sklearn. KNeighboursClassifier / Scikit-learn
- RadiusNeighboursClassifier / Scikit-learn
- kurtosis / Operations and statistics
L
- L1 norm / Regularization
- lambda / Regularization
- lambda(λ) / Data distributions
- Large Hadron Collider / Data velocity
- Lasso / Gradient descent
- lasso regression / Regularization
- learning curves
- using / Learning curves
- least general generalization (LGG) / Generality ordering
- least squares
- about / Introducing least squares
- gradient descent / Gradient descent
- normal equation / The normal equation
- leaving variable / Linear programming
- Linear programming (LP) / Linear programming
- local minimum / Gradient descent
- logical models
- about / Logical models
- generality ordering / Generality ordering
- version space / Version space
- coverage space / Coverage space
- Probably Approximately Correct (PAC) learning / PAC learning and computational complexity
- computational complexity / PAC learning and computational complexity
- logistic calibration / Calibration
- logistic function / Logistic regression
- logistic regression
- about / Logistic regression
- cost function / The Cost function for logistic regression
- logistic units
- about / Logistic units
M
- machine learning
- about / Machine learning at a glance
- machine learning systems
- activities / Design principles
- MapReduce / Data volume
- margin / Ensemble strategies
- mathematical operations, NumPy
- about / Mathematical operations
- vectors / Mathematical operations
- polynomial functions / Mathematical operations
- Matlab / Python for machine learning
- Matplotlib
- about / Installing the SciPy stack, Matplotlib
- mean (µ (mu)) / Data distributions
- meta-estimators / Multiclass classification
- meta-learning / Other methods
- meta-model / Other methods
- MNIST dataset
- reference link / Implementing a neural network
- model performance
- evaluating / Evaluating model performance
- measuring, with precision (P) / Evaluating model performance
- measuring, with recall (R) / Evaluating model performance
- models
- about / Models
- grouping approach / Models
- geometric models / Geometric models
- probabilistic models / Probabilistic models
- logical models / Logical models
- rule models / Logical models, Rule models
- tree models / Tree models
- model selection
- about / Model selection
- Gridsearch, using / Gridsearch
- Moore's law / Data volume
- Mr Clippy / The human interface
- multiclass classification
- about / Multiclass classification
- one versus all technique / Multiclass classification
- one versus one technique / Multiclass classification
- multicollinearity / Scikit-learn
- MySQL server
- reference link / Data from databases
N
- Natural Language Tool Kit (NLTK) / Data from natural language
- nelder-mead solver / SciPy
- neural net architectures
- about / Other neural net architectures
- deep architecture / Other neural net architectures
- recurrent neural networks (RNNs) / Other neural net architectures
- neural network
- starting with / Getting started with neural networks
- implementing / Implementing a neural network
- Ngram
- reference link / The human interface
- normal equation / The normal equation
- normalization / Normalization
- NumPy
- about / Installing the SciPy stack, NumPY
- arrays, constructing / Constructing and transforming arrays
- arrays, transforming / Constructing and transforming arrays
- mathematical operations / Mathematical operations
O
- one hot encoding / Transforming features
- one versus all technique / Multiclass classification
- one versus one technique / Multiclass classification
- operations / Operations and statistics
- ordered list approach / The ordered list approach
- ordinal features / Ordinal features
- Ordinary Least Squares / Scikit-learn
- orthogonal function / Signals
P
- packages, SciPy
- Pandas
- about / Pandas
- parameters, bagging
- pdfminer3k / Cleaning data
- pdftotext / Cleaning data
- pivoting / Linear programming
- polynomial regression / Gradient descent
- Portable Document Format (PDF) / Cleaning data
- principle component analysis (PCA)
- about / Principle component analysis
- Principle Component Analysis (PCA) / Features, Scikit-learn
- probability density function / Data distributions
- Probably Approximately Correct (PAC) learning / PAC learning and computational complexity
- Python
- used, for machine learning / Python for machine learning
Q
- quantitative features / Quantitative features
R
- R / Python for machine learning
- random forest
- about / Random forests
- real-world case studies
- about / Real-world case studies
- recommender system, building / Building a recommender system
- integrated pest management systems greenhouses / Insect detection in greenhouses
- Receiver Operator Characteristic (ROC) / Calibration
- recommender system
- building / Building a recommender system
- content-based filtering approach / Content-based filtering
- collaborative filtering approach / Collaborative filtering
- reviewing / Reviewing the case study
- recurrent neural networks (RNNs) / Other neural net architectures
- regularization
- about / Regularization
- Ridge / Gradient descent
- ridge regression / Regularization
- Rosenbrock function / SciPy
- rule models
- purity / Rule models
- ordered list approach / The ordered list approach
- set-based rule models / Set-based rule models
S
- sample complexity / PAC learning and computational complexity
- scikit-learn
- about / Scikit-learn
- SciPy
- SciPy stack
- installing / Installing the SciPy stack
- reference link / Installing the SciPy stack
- about / Installing the SciPy stack
- segment / Models
- set-based rule models / Set-based rule models
- SGDClassifier / Gradient descent
- SGDRegressor / Gradient descent
- shrinkage / Scikit-learn
- sigmoid function / Logistic regression
- signals
- about / Signals
- data, obtaining from sound / Data from sound
- similarity function / Features
- skewness / Operations and statistics
- stacked generalization / Other methods
- stacking / Other methods
- standard deviation (δ (sigma)) / Data distributions
- standardized features / Calibration
- statistics
- about / Operations and statistics
- dispersion / Operations and statistics
- shape / Operations and statistics
- central tendency / Operations and statistics
- Stochastic gradient descent / Gradient descent
- stratified cross validation / Evaluating model performance
- streaming processing / Data velocity
- structured features
- about / Structured features
- subgroup discovery / Set-based rule models
- subspace sampling / Bagging, Random forests
- sum of the squared error / Introducing least squares
- Support Vector Machines (SVM) / Features
T
- task classification
- binary classification / Classification
- multiclass classification / Classification
- tasks
- about / Tasks
- classification / Classification
- regression / Regression
- clustering / Clustering
- dimensionality reduction / Dimensionality reduction
- errors / Errors
- optimization problems / Optimization
- linear programming / Linear programming
- models / Models
- features / Features
- term-document matrix / Content-based filtering
- Thrip / Visualizing data
- tree models
- about / Tree models
- purity / Purity
U
- unified modelling language (UML)
- about / Unified modeling language
- class diagrams / Class diagrams
- object diagrams / Object diagrams
- activity diagrams / Activity diagrams
- state diagrams / State diagrams
V
- version space / Version space
W
- weak learnability / Boosting
- Whitefly / Visualizing data
- WordNet
- reference link / The human interface
- world database / Data from databases