Index
A
- agglomerative hierarchical clustering / Hierarchical clustering
- aggregation operations
- about / Aggregation operations
- average / Aggregation operations
- SUM / Aggregation operations
- MAX / Aggregation operations
- MIN / Aggregation operations
- STD / Aggregation operations
- COUNT / Aggregation operations
- ANOVA
- about / ANOVA
- Apache Spark
- about / Python with Apache Spark
- Python with / Python with Apache Spark
- installing, URL / Python with Apache Spark
- sentiment, scoring / Scoring the sentiment
- overall sentiment / The overall sentiment
- area plot
- about / Area plots
- example / Area plots
- array
- with NumPy / The world of arrays with NumPy
- creating / Creating an array
- subtraction / Array subtraction
- squaring / Squaring an array
- trigonometric function / A trigonometric function performed on the array
- conditional operations / Conditional operations
- matrix multiplication / Matrix multiplication
- slicing / Indexing and slicing
- indexing / Indexing and slicing
B
- Bernoulli distribution
- about / A Bernoulli distribution
- box plot
- bubble chart
- about / Bubble charts
C
- census income dataset
- about / The census income dataset
- exploring / Exploring the census data
- people histogram, creating / Hypothesis 1: People who are older earn more
- earning bias, working class based / Hypothesis 2: Income bias based on working class
- earning power, education based / Hypothesis 3: People with more education earn more
- earning power, marital status based / Hypothesis 4: Married people tend to earn more
- earning power, race based / Hypothesis 5: There is a bias in income based on race
- earning power, occupation based / Hypothesis 6: There is a bias in the income based on occupation
- earning power, gender based / Hypothesis 7: Men earn more
- earning power, productive hours based / Hypothesis 8: People who clock in more hours earn more
- earning power, native countries based / Hypothesis 9: There is a bias in income based on the country of origin
- chart
- line properties, controlling / Controlling the line properties of a chart
- text, adding / Playing with text
- chi-square distribution
- about / The chi-square distribution
- chi-square test
- for goodness / Chi-square for the goodness of fit
- of independence / The chi-square test of independence
- classification trees
- about / Decision trees
- collaborative filtering
- user-based collaborative filtering / User-based collaborative filtering
- item-based collaborative filtering / Item-based collaborative filtering
- conditional operations
- about / Conditional operations
- confidence interval
- about / A confidence interval
- consumer key
- correlation
- about / Correlation
- CSV
- about / CSV
D
- 3D plot
- plotting / A 3D plot of a surface
- data
- exporting / Inserting and exporting data
- importing / Inserting and exporting data
- inserting / Inserting and exporting data
- preprocessing / Preprocessing data
- data, cleansing
- data, merging / Merging data
- database
- data, reading from / Database
- data cleansing
- about / Data cleansing
- missing data, checking / Checking the missing data
- missing data, filling / Filling the missing data
- string operation / String operations
- DataFrame
- about / DataFrame
- data journalism website
- about / Styling your plots
- data mining
- about / What is data mining?
- analysis, presenting / Presenting an analysis
- data operations
- aggregation operations / Aggregation operations
- joins / Joins
- decision trees
- about / Decision trees, Decision trees
- classification trees / Decision trees
- regression trees / Decision trees
- distribution
- forms / Various forms of distribution
- normal distribution / A normal distribution
- normal distribution, from binomial distribution / A normal distribution from a binomial distribution
- Poisson distribution / A Poisson distribution
- Bernoulli distribution / A Bernoulli distribution
- divisive hierarchical clustering / Hierarchical clustering
E
- elbow curve / Determining the number of clusters
- euclidean distance / Determining the number of clusters
- Euclidean distance score
- about / The Euclidean distance score
F
- Fast Moving Consumer Goods (FMCG) / What is data mining?
- F distribution
- about / The F distribution
- full outer join / The full outer join
G
- groupby function / The groupby function
H
- Hadoop
- about / What is Hadoop?
- programming model / The programming model
- MapReduce, architecture / The MapReduce architecture
- DFS / The Hadoop DFS
- DFS, architecture / Hadoop's DFS architecture
- URL / Python MapReduce
- Hadoopy
- used, for file handling / File handling with Hadoopy
- URL / File handling with Hadoopy
- heatmap
- hexagon bin plot
- about / Hexagon bin plots
- hierarchical clustering
- about / Hierarchical clustering
- agglomerative hierarchical clustering / Hierarchical clustering
- divisive hierarchical clustering / Hierarchical clustering
- histograms
- combining, with scatter plot / Scatter plots with histograms
I
- inner join / The inner join
- item-based collaborative filtering
J
- joins
- about / Joins
- inner join / The inner join
- left outer join / The left outer join
- full outer join / The full outer join
- groupby function / The groupby function
- JSON
- about / JSON
K
- k-means clustering
- k-means clustering, with countries
- about / The k-means clustering with countries
- number of clusters, determining / Determining the number of clusters
- applying / Clustering the countries
- Kaggle
- URL / Summary
- keyword arguments
- used, for controlling line properties of chart / Using keyword arguments
L
- left outer join / The left outer join
- lemmatization
- about / Stemming and lemmatization, Lemmatization
- linear regression
- about / Linear regression
- simple linear regression / Simple linear regression
- multiple linear regression / Multiple regression
- linear regression model
- building, with statsmodels module / Training and testing a model
- building, with SciKit package / Training and testing a model
- line properties, chart
- controlling / Controlling the line properties of a chart
- controlling, with keyword arguments / Using keyword arguments
- controlling, with setter methods / Using the setter methods
- controlling, with setp() command / Using the setp() command
- logistic regression
- about / Logistic regression, Logistic regression
- data, preparing / Data preparation
- training, creating / Creating training and testing sets
- sets, testing / Creating training and testing sets
- model, building / Building a model, Model building and evaluation with SciKit
- model, evaluating / Model evaluation
- model evaluating, test data based / Evaluating a model based on test data
- model, evaluating with SciKit / Model building and evaluation with SciKit
M
- machine learning
- Andrew NG course, URL / Summary
- machine learning, types
- about / Different types of machine learning
- supervised learning / Different types of machine learning
- unsupervised learning / Different types of machine learning
- reinforcement learning / Different types of machine learning
- MapReduce
- about / The MapReduce architecture
- Python used / Python MapReduce
- word count / The basic word count
- sentiment score, for review / A sentiment score for each review
- overall sentiment score / The overall sentiment score
- code, deploying on Hadoop / Deploying the MapReduce code on Hadoop
- mathematical operations
- about / Mathematical operations
- matrix multiplication
- about / Matrix multiplication
- model
- training / Training and testing a model
- testing / Training and testing a model
- multiple linear regression
- about / Multiple regression
- example / Multiple regression
- multiple plots
- creating / Creating multiple plots
N
- naive Bayes classifier
- about / The naive Bayes classifier
- Natural Language Toolkit (NLTK)
- URL / Preprocessing data
- normal distribution
- about / A normal distribution
- from binomial distribution / A normal distribution from a binomial distribution
- null hypothesis
- about / A p-value
- NumPy array
- about / The world of arrays with NumPy
- NumPy documentation
- URL / Summary
O
- one-tailed tests
- about / One-tailed and two-tailed tests
- Ordinary Least Square Regression (OLS)
- about / Training and testing a model
P
- P-value
- about / A p-value
- pandas, data structure
- about / The data structure of pandas
- series / Series
- DataFrame / DataFrame
- panel / Panel
- pandas documentation
- URL / Summary
- pandas library
- panel
- about / Panel
- parts of speech tagging
- about / Parts of speech tagging
- Pearson correlation score
- about / The Pearson correlation score
- Pig
- about / Pig
- Pig Latin
- URL / Pig
- plots
- styling / Styling your plots
- Poisson distribution
- about / A Poisson distribution
- PunktSentenceTokenizer / Word and sentence tokenization
R
- random forests
- about / Random forests
- RDDs (Resilient Distributed Datasets) / Python with Apache Spark
- recommendation data
- about / Recommendation data
- regression trees
- about / Decision trees
- reinforcement learning
- about / Reinforcement learning
S
- scatter plot
- with histograms / Scatter plots with histograms
- scatter plot matrix
- about / A scatter plot matrix
- SciKit package
- used, for building linear regression model / Training and testing a model
- SciPy package
- sentence tokenization / Word and sentence tokenization
- Sentiment Analysis
- on world leaders, Twitter used / Performing sentiment analysis on world leaders using Twitter
- sentiments
- series
- about / Series
- setp() command
- used, for controlling line properties of chart / Using the setp() command
- setter methods
- used, for controlling line properties of chart / Using the setter methods
- shape
- manipulating / Shape manipulation
- simple linear regression
- about / Simple linear regression
- example / Simple linear regression
- Stanford Named Entity Recognizer
- statsmodels module
- used, for building linear regression model / Training and testing a model
- about / Training and testing a model
- stemming
- about / Stemming and lemmatization, Stemming
- string operation
- substring / String operations
- filtering / String operations
- uppercase / String operations
- lowercase / String operations
- length / String operations
- split / String operations
- replace / String operations
- supervised learning
- about / Supervised learning
T
- T-test
- versus Z-test / Z-test vs T-test
- tags
- URL / Parts of speech tagging
- text
- adding, to chart / Playing with text
- Titanic survivors dataset
- about / Studying the Titanic
- passenger class survivors, determining / Which passenger class has the maximum number of survivors?
- survivors distributions, determining based on gender / What is the distribution of survivors based on gender among the various classes?
- nonsurvivors distributions, determining / What is the distribution of nonsurvivors among the various classes who have family aboard the ship?
- survival percentage, searching among age groups / What was the survival percentage among different age groups?
- Trellis plot
- about / Trellis plots
- example / Trellis plots
- trigonometric function
- two-tailed tests
- about / One-tailed and two-tailed tests
- Twython package
- Type 1 error
- about / Type 1 and Type 2 errors
- Type 2 error
- about / Type 1 and Type 2 errors
U
- unsupervised learning
- about / Unsupervised learning
- user-based collaborative filtering
- about / User-based collaborative filtering
- similar users, finding / Finding similar users
- Euclidean distance score / The Euclidean distance score
- Pearson correlation score / The Pearson correlation score
- users, ranking / Ranking the users
- items, recommending / Recommending items
W
- wordcloud
- creating / Creating a wordcloud
- URL / Creating a wordcloud
- word tokenization / Word and sentence tokenization
X
- XLS
- about / XLS
Z
- z-score
- about / A z-score
- Z-test
- versus T-test / Z-test vs T-test