Index
A
- AcceptedAnswerId attribute / Preselection and processing of attributes
- add-one smoothing / Accounting for unseen words and other oddities
- additive smoothing / Accounting for unseen words and other oddities
- advanced baskets analysis
- about / More advanced basket analysis
- Amazon Linux
- Python packages, installing on / Installing Python packages on Amazon Linux
- Amazon regions / Using Amazon Web Services (AWS)
- Apriori algorithm / Analyzing supermarket shopping baskets
- area under curve (AUC) / Looking behind accuracy – precision and recall, An alternate way to measure classifier performance using receiver operator characteristic (ROC)
- as keypoint detection / Local feature representations
- Associated Press (AP)
- about / Building a topic model
- association rule mining
- about / Association rule mining
- association rules
- about / Association rule mining
- attributes
- preselecting / Preselection and processing of attributes
- processing / Preselection and processing of attributes
- Auditory Filterbank Temporal Envelope (AFTE) / Improving classification performance with Mel Frequency Cepstral Coefficients
- Automatic Music Genre Classification (AMGC) / Improving classification performance with Mel Frequency Cepstral Coefficients
- AWS
- using / Using Amazon Web Services (AWS)
- machine, creating / Creating your first machines
B
- bag-of-word approach
- bag-of-word approach, challenges
- about / Preprocessing – similarity measured as similar number of common words
- raw text, converting into bag-of-words / Converting raw text into a bag-of-words
- words, counting / Counting words
- word count vectors, normalizing / Normalizing the word count vectors
- less important words, removing / Removing less important words
- stemming / Stemming
- stop words, on steroids / Stop words on steroids
- basic image processing
- about / Basic image processing
- thresholding / Thresholding
- Gaussian blurring / Gaussian blurring
- filtering, for different effects / Filtering for different effects
- basket analysis
- about / Basket analysis
- useful predictions, obtaining / Obtaining useful predictions
- supermarket shopping baskets, analyzing / Analyzing supermarket shopping baskets
- association rule mining / Association rule mining
- beer and diapers story / Basket analysis
- BernoulliNB / Creating our first classifier and tuning it
- Bias-variance
- about / Bias-variance and its trade-off
- trade-off / Bias-variance and its trade-off
- big data expression
- about / Learning about big data
- binary classification
- binary matrix of recommendations
- blogs, machine language / Blogs
- Body attribute / Preselection and processing of attributes
C
- classification
- about / The Iris dataset
- Naive Bayes, using for / Using Naive Bayes to classify
- classification model
- building / Building our first classification model
- evaluating / Evaluation – holding out data and cross-validation
- structure / Building more complex classifiers
- search procedure / Building more complex classifiers
- loss function / Building more complex classifiers
- classification performance
- improving, with Mel Frequency Cepstral Coefficients / Improving classification performance with Mel Frequency Cepstral Coefficients
- classifier
- creating / Creating our first classifier, Solving an easy problem first
- training / Training the classifier, Training the classifier
- performance, measuring / Measuring the classifier's performance
- performance, improving / Deciding how to improve
- slimming / Slimming the classifier
- integrating, into site / Ship it!
- classes, using / Using all the classes
- parameters, tuning / Tuning the classifier's parameters
- building, FFT used / Using FFT to build our first classifier, Increasing experimentation agility
- classifier, classy answers
- tuning / Tuning the classifier
- classifier performance
- measuring, receiver operator characteristic (ROC) used / An alternate way to measure classifier performance using receiver operator characteristic (ROC)
- classifier performance, improving
- Bias-variance / Bias-variance and its trade-off
- high bias, fixing / Fixing high bias
- high variance, fixing / Fixing high variance
- high bias or low bias / High bias or low bias
- classy answers
- classifying / Learning to classify classy answers
- instance, tuning / Tuning the instance
- classifier, tuning / Tuning the classifier
- cloud machine
- jug, running on / Running jug on our cloud machine
- cluster generation
- automating, with starcluster / Automating the generation of clusters with starcluster
- clustering
- about / Measuring the relatedness of posts, Clustering
- flat clustering / Clustering
- hierarchical clustering / Clustering
- KMeans algorithm / KMeans
- test data, obtaining for idea evaluation / Getting test data to evaluate our ideas on
- cluster package / Learning SciPy
- CommentCount attribute / Preselection and processing of attributes
- complex classifiers
- building / Building more complex classifiers
- complex dataset
- confusion matrix
- used, for accuracy measurement in multiclass problems / Using the confusion matrix to measure accuracy in multiclass problems
- constants package / Learning SciPy
- correlation
- about / Correlation
- using / Correlation
- cost function
- CountVectorizer
- Coursera
- URL / Online courses
- CreationDate attribute / Preselection and processing of attributes
- cross-validation
- cross-validation, for regression / Cross-validation for regression
- cross-validation schedule
- Cross Validated
- about / What to do when you are stuck, Q
- URL / Q
D
- data
- fetching / Fetching the data
- slimming down, to chewable chunks / Slimming the data down to chewable chunks
- data, machine learning application
- reading / Reading in the data
- cleaning / Preprocessing and cleaning the data
- preprocessing / Preprocessing and cleaning the data
- data analysis
- jug, using for / Using jug for data analysis
- data sources, machine language / Data sources
- dimensionality reduction
- about / Sketching our roadmap
- dot() function / Comparing runtime behaviors
E
- Elastic net model
- about / L1 and L2 penalties
- Elastic nets
- using, in scikit-Learn / Using Lasso or Elastic nets in scikit-learn
- ensemble learning / Combining multiple methods
- Enthought Python Distribution
- URL / Installing Python
F
- false negative
- false positive
- Fast Fourier transformation / Learning SciPy
- feature engineering
- feature extraction
- features
- about / The Iris dataset, Features and feature engineering
- engineering / Engineering the features
- designing / Designing more features
- computing, from images / Computing features from images
- writing / Writing your own features
- selecting / Selecting features
- feature selection
- about / Features and feature engineering
- feature selection methods
- about / Other feature selection methods
- FFT
- used, for building classifier / Using FFT to build our first classifier, Increasing experimentation agility
- fftpack package / Learning SciPy
- filtering
- for different effects / Filtering for different effects
- filters
- used, for detecting features / Detecting redundant features using filters
- disadvantage / Mutual information
- fit_transform() method / Limitations of PCA and how LDA can help
- flat clustering
- about / Clustering
G
- Gaussian blurring
- about / Gaussian blurring
- GaussianNB / Creating our first classifier and tuning it
- genfromtxt() function / Reading in the data
- gensim package
- about / Building a topic model
- good answers
- defining / Defining what is a good answer
- graphical processing units (GPUs) / Using Amazon Web Services (AWS)
- GTZAN dataset
- about / Fetching the music data
- URL, for downloading / Fetching the music data
H
- Haralick texture features
- about / Computing features from images
- harder dataset
- classifying / Classifying a harder dataset
- hierarchical clustering
- about / Clustering
- hierarchical Dirichlet process (HDP) / Choosing the number of topics
- house prices
- predicting, with regression / Predicting house prices with regression
- hyperparameters
- setting / Setting hyperparameters in a smart way
I
- image processing
- about / Introducing image processing
- images
- loading / Loading and displaying images
- displaying / Loading and displaying images
- features, computing from / Computing features from images
- indexing, NumPy / Indexing
- installation, Python / Installing Python
- installation, Python packages
- on Amazon Linux / Installing Python packages on Amazon Linux
- instance
- about / Creating your first machines
- instance, classy answers
- tuning / Tuning the instance
- integrate package / Learning SciPy
- interest point detection / Local feature representations
- International Society forMusic Information Retrieval (ISMIR) / Improving classification performance with Mel Frequency Cepstral Coefficients
- interpolate package / Learning SciPy
- io package / Learning SciPy
- Iris dataset
- about / The Iris dataset
- visualization / The first step is visualization
J
- JPEG
- about / Introducing image processing
- jug
- used for breaking up pipeline, into tasks / Using jug to break up your pipeline into tasks
- about / Using jug to break up your pipeline into tasks
- partial results, reusing / Reusing partial results
- working / Looking under the hood
- using, for data analysis / Using jug for data analysis
- URL, for documentation / Using jug for data analysis
- running, on cloud machine / Running jug on our cloud machine
- jug cleanup / Using jug for data analysis
- jug execute file / About tasks
- jugfile.jugdata directory / About tasks
- jugfile.py file / About tasks
- jug invalidate / Using jug for data analysis
- jug status --cache / Using jug for data analysis
K
- k-means clustering
- about / Local feature representations
- k-nearest neighbor (kNN) algorithm
- Kaggle
- URL / Getting competitive
- keys
- KMeans
- about / KMeans
L
- labels
- Laplace smoothing / Accounting for unseen words and other oddities
- Lasso
- about / L1 and L2 penalties
- using, in scikit-Learn / Using Lasso or Elastic nets in scikit-learn
- LDA
- learning algorithm
- Levenshtein distance
- about / How not to do it
- Lidstone smoothing
- lift
- about / Association rule mining
- linalg package / Learning SciPy
- LinearRegression class / Cross-validation for regression
- Load Sharing Facility (LSF) / Using jug to break up your pipeline into tasks
- local feature representations
- about / Local feature representations
- logistic regression
- using / Using logistic regression
- example / A bit of math with a small example
- applying, to postclassification problem / Applying logistic regression to our postclassification problem
- logistic regression classifier / Training the classifier
- loss function / Building more complex classifiers
M
- machine learning (ML)
- goals / Machine learning and Python – the dream team
- in real world / Rating prediction and recommendations
- online courses / Online courses
- books / Books
- Q&A sites / Q
- blogs / Blogs
- data sources / Data sources
- supervised learning competitions / Getting competitive
- additional resources / What was left out
- machine learning application
- about / Our first (tiny) machine learning application
- data, reading / Reading in the data
- data, preprocessing / Preprocessing and cleaning the data
- data, cleaning / Preprocessing and cleaning the data
- learning algorithm, selecting / Choosing the right model and learning algorithm, Before building our first model, Starting with a simple straight line, Towards some advanced stuff, Stepping back to go forward – another look at our data, Training and testing, Answering our initial question
- Machine Learning Repository / Data sources
- Machine Learning Toolkit (MILK)
- URL / What was left out
- machines
- creating / Creating your first machines
- Mahotas
- mahotas.features / Computing features from images
- mahotas computer vision package
- about / Loading and displaying images
- massive open online course (MOOC) / Online courses
- Matplotlib
- matshow() function / Using the confusion matrix to measure accuracy in multiclass problems
- maxentropy package / Learning SciPy
- MDS
- Mel Frequency Cepstral Coefficients
- used, for improving classification performance / Improving classification performance with Mel Frequency Cepstral Coefficients
- Mel Frequency Cepstral Coefficients (MFCC) / Improving classification performance with Mel Frequency Cepstral Coefficients
- Mel Frequency Cepstrum (MFC) / Improving classification performance with Mel Frequency Cepstral Coefficients
- MetaOptimize
- MetaOptimized
- about / What to do when you are stuck
- mfcc() function / Improving classification performance with Mel Frequency Cepstral Coefficients
- mh.features.haralick function / Computing features from images
- MLComp
- Modular toolkit for Data Processing (MDP)
- URL / What was left out
- movie recommendation dataset
- about / Improved recommendations
- binary matrix of recommendations, using / Using the binary matrix of recommendations
- movie neighbors, viewing / Looking at the movie neighbors
- multiple methods, combining / Combining multiple methods
- MP3 files
- converting, into wave format / Converting into a wave format
- multiclass classification
- multiclass problems
- confusion matrix, used for accuracy measurement / Using the confusion matrix to measure accuracy in multiclass problems
- multidimensional regression
- about / Multidimensional regression
- MultinomialNB / Creating our first classifier and tuning it
- music
- decomposing, into sine wave components / Decomposing music into sine wave components
- music data
- fetching / Fetching the music data
- Music Information Retrieval (MIR) / Improving classification performance with Mel Frequency Cepstral Coefficients
N
- Naive Bayes
- used, for classification / Using Naive Bayes to classify
- Naive Bayes classifier
- about / Introducing the Naive Bayes classifier, Getting to know the Bayes theorem
- accounting, for unseen words / Accounting for unseen words and other oddities
- accounting, for oddities / Accounting for unseen words and other oddities
- accounting, for arithmetic underflows / Accounting for arithmetic underflows
- Naive Bayes classifiers
- GaussianNB / Creating our first classifier and tuning it
- MultinomialNB / Creating our first classifier and tuning it
- BernoulliNB / Creating our first classifier and tuning it
- ndimage (n-dimensional image)
- about / Loading and displaying images
- ndimage package / Learning SciPy
- nearest neighbor classification
- about / Nearest neighbor classification
- nearest neighbor search (NNS) / What the book will teach you (and what it will not)
- Netflix
- NLTK
- installing / Installing and using NLTK
- using / Installing and using NLTK
- NLTK's stemmer
- used, for extending vectorizer / Extending the vectorizer with NLTK's stemmer
- norm() function / Counting words
- np.linalg.lstsq function / Predicting house prices with regression
- NumPy
- URL, for tutorials / Chewing data efficiently with NumPy and intelligently with SciPy
- learning / Learning NumPy
- indexing / Indexing
- non-existing values, handling / Handling non-existing values
- runtime behaviors, comparing / Comparing runtime behaviors
- about / Loading and displaying images
O
- odr package / Learning SciPy
- OpenCV
- about / Loading and displaying images
- optimize package / Learning SciPy
- Oracle Grid Engine (OGE) / Using jug to break up your pipeline into tasks
- Otsu threshold
- about / Thresholding
- overfitting
- about / Towards some advanced stuff
- OwnerUserId attribute / Preselection and processing of attributes
P
- packages, SciPy
- cluster / Learning SciPy
- constants / Learning SciPy
- fftpack / Learning SciPy
- integrate / Learning SciPy
- interpolate / Learning SciPy
- io / Learning SciPy
- linalg / Learning SciPy
- maxentropy / Learning SciPy
- ndimage / Learning SciPy
- odr / Learning SciPy
- optimize / Learning SciPy
- signal / Learning SciPy
- sparse / Learning SciPy
- spatial / Learning SciPy
- special / Learning SciPy
- stats / Learning SciPy
- parameters
- tweaking / Tweaking the parameters
- partial results
- reusing / Reusing partial results
- Part Of Speech (POS) / Sketching our roadmap, Determining the word types
- pattern recognition
- about / Pattern recognition
- PCA
- about / Sketching our roadmap, About principal component analysis (PCA)
- sketching / Sketching PCA
- applying / Applying PCA
- limitations / Limitations of PCA and how LDA can help
- pearsonr() function
- about / Correlation
- penalized regression
- about / Penalized regression
- L2 penalty / L1 and L2 penalties
- L1 penalty / L1 and L2 penalties
- Lasso model / L1 and L2 penalties
- Elastic net / L1 and L2 penalties
- Penn Treebank Project
- P greater than N scenarios
- about / P greater than N scenarios
- text example / An example based on text
- hyperparameters, setting / Setting hyperparameters in a smart way
- prediction, rating / Rating prediction and recommendations
- recommendations, rating / Rating prediction and recommendations
- PNG
- about / Introducing image processing
- polyfit() function / Starting with a simple straight line
- Portable Batch System (PBS) / Using jug to break up your pipeline into tasks
- postclassification problem
- logistic regression, applying to / Applying logistic regression to our postclassification problem
- posts
- relatedness, measuring / Measuring the relatedness of posts, How to do it
- clustering / Clustering posts
- PostType attribute / Preselection and processing of attributes
- Pybrain
- URL / What was left out
- pymining
- about / More advanced basket analysis
- pyplot package / Preprocessing and cleaning the data
- Python
- installing / Installing Python
- about / Loading and displaying images
- Python packages
- installing, on Amazon Linux / Installing Python packages on Amazon Linux
Q
- Q&A sites
- about / What to do when you are stuck, Q
- MetaOptimize / Q
- Cross Validated / Q
R
- read_fft() function / Increasing experimentation agility
- receiver operator characteristic (ROC)
- used, for measuring classifier performance / An alternate way to measure classifier performance using receiver operator characteristic (ROC)
- about / An alternate way to measure classifier performance using receiver operator characteristic (ROC)
- redundant features
- detecting, filters used / Detecting redundant features using filters
- redundant features detection
- correlation, using / Correlation
- mutual information / Mutual information
- regression
- used, for predicting house prices / Predicting house prices with regression
- Ridge regression
- about / L1 and L2 penalties
- Ridley-Calvard method
- about / Thresholding
- root mean squared error (RMSE) / Predicting house prices with regression
S
- salt and pepper noise
- adding / Adding salt and pepper noise
- center, inserting in focus / Putting the center in focus
- save() function / Increasing experimentation agility
- Scikit
- about / Clustering
- scikit-image (Skimage)
- about / Loading and displaying images
- scikit-Learn
- Lasso, using in / Using Lasso or Elastic nets in scikit-learn
- Elastic nets, using in / Using Lasso or Elastic nets in scikit-learn
- SciPy
- Score attribute / Preselection and processing of attributes
- Secure Shell (SSH)
- about / Creating your first machines
- Securities and Exchange Commission (SEC)
- about / An example based on text
- Seeds dataset
- about / Learning about the Seeds dataset
- sentiment analysis, tweet / Sketching our roadmap
- SentiWordNet
- URL / Successfully cheating using SentiWordNet
- about / Successfully cheating using SentiWordNet
- used, for cheating / Successfully cheating using SentiWordNet
- SIFT
- signal package / Learning SciPy
- similarity
- comparing, in topic space / Comparing similarity in topic space
- sine wave components
- music, decomposing into / Decomposing music into sine wave components
- sklearn.feature_selection package / Asking the model about the features using wrappers
- sklearn.lda
- sklearn.naive_bayes package / Creating our first classifier and tuning it
- sklearn package / Converting raw text into a bag-of-words
- sobel filtering
- about / Writing your own features
- sparse package / Learning SciPy
- sparsity
- about / Building a topic model
- spatial package / Learning SciPy
- specgram() function / Looking at music
- special package / Learning SciPy
- spectrogram
- about / Looking at music
- starcluster
- cluster generation, automating with / Automating the generation of clusters with starcluster
- Starcluster
- URL, for documentation / Automating the generation of clusters with starcluster
- stats package / Learning SciPy
- stemming
- about / Stemming
- NLTK, installing / Installing and using NLTK
- NLTK, using / Installing and using NLTK
- supermarket shopping baskets
- analyzing / Analyzing supermarket shopping baskets
- supervised learning
- about / The Iris dataset
- support vector machines (SVM) / What the book will teach you (and what it will not)
- SURF
- about / Local feature representations
- system
- demonstrating, for new post / Solving our initial challenge, Another look at noise
T
- Talkbox SciKit / Improving classification performance with Mel Frequency Cepstral Coefficients
- task
- about / About tasks
- example / About tasks
- term frequency - inverse document frequency (TF-IDF) / Stop words on steroids
- testing error
- text preprocessing phase
- achievements / Our achievements and goals
- goals / Our achievements and goals
- thresholding
- about / Thresholding
- Title attribute / Preselection and processing of attributes
- topic model
- topic modeling
- about / Choosing the number of topics
- topics
- about / Latent Dirichlet allocation (LDA)
- selecting / Choosing the number of topics
- topic space
- similarity, comparing / Comparing similarity in topic space
- training error
- transform method / Counting words
- tweets
- cleaning / Cleaning tweets
- Twitter data
- fetching / Fetching the Twitter data
- TwoToReal
U
- University of California at Irvine (UCI)
- about / Learning about the Seeds dataset
V
- vectorization
- about / How to do it
- vectorizer
- extending, with NLTK's stemmer / Extending the vectorizer with NLTK's stemmer
- ViewCount attribute / Preselection and processing of attributes
- visualization, Iris dataset / The first step is visualization
W
- wave format
- MP3 files, converting into / Converting into a wave format
- Wikipedia
- modeling / Modeling the whole of Wikipedia
- URL, for dumps / Modeling the whole of Wikipedia
- word count vectors
- normalizing / Normalizing the word count vectors
- wordle
- URL / Building a topic model
- word sense disambiguation / Successfully cheating using SentiWordNet
- word types
- about / Taking the word types into account
- determining / Determining the word types
- wrappers