4. Cluster Analysis | Learning Predictive Analytics with R

Book Overview & Buying
Table Of Contents

Learning Predictive Analytics with R

By : Eric Mayor

3 (2)

Buy this Book

Learning Predictive Analytics with R

3 (2)

By: Eric Mayor

Buy this Book

Overview of this book

This book is packed with easy-to-follow guidelines that explain the workings of the many key data mining tools of R, which are used to discover knowledge from your data. You will learn how to perform key predictive analytics tasks using R, such as train and test predictive models for classification and regression tasks, score new data sets and so on. All chapters will guide you in acquiring the skills in a practical way. Most chapters also include a theoretical introduction that will sharpen your understanding of the subject matter and invite you to go further. The book familiarizes you with the most common data mining tools of R, such as k-means, hierarchical regression, linear regression, association rules, principal component analysis, multilevel modeling, k-NN, Naïve Bayes, decision trees, and text mining. It also provides a description of visualization techniques using the basic visualization tools of R as well as lattice for visualizing patterns in data organized in groups. This book is invaluable for anyone fascinated by the data mining opportunities offered by GNU R and its packages.

Preface

Prediction

Supervised and unsupervised learning

Classification and regression problems

The role of field knowledge in data modeling

Caveats

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Free Chapter

1. Setting GNU R for Predictive Analytics

Installing GNU R

The R graphic user interface

The menu bar of the R console

Packages

Summary

2. Visualizing and Manipulating Data Using R

The roulette case

Histograms and bar plots

Scatterplots

Boxplots

Line plots

Application – Outlier detection

Formatting plots

Summary

3. Data Visualization with Lattice

Loading and discovering the lattice package

Discovering multipanel conditioning with xyplot()

Discovering other lattice plots

Updating graphics

Case study – exploring cancer-related deaths in the US

Summary

4. Cluster Analysis

Distance measures

Learning by doing – partition clustering with kmeans()

Using k-means with public datasets

Summary

5. Agglomerative Clustering Using hclust()

The inner working of agglomerative clustering

Agglomerative clustering with hclust()

Summary

6. Dimensionality Reduction with Principal Component Analysis

The inner working of Principal Component Analysis

Learning PCA in R

Summary

7. Exploring Association Rules with Apriori

Apriori – basic concepts

The inner working of apriori

Analyzing data with apriori in R

Summary

8. Probability Distributions, Covariance, and Correlation

Probability distributions

Covariance and correlation

Summary

9. Linear Regression

Understanding simple regression

Working with multiple regression

Analyzing data in R: correlation and regression

Robust regression

Bootstrapping

Summary

10. Classification with k-Nearest Neighbors and Naïve Bayes

Understanding k-NN

Working with k-NN in R

Understanding Naïve Bayes

Working with Naïve Bayes in R

Computing the performance of classification

Summary

11. Classification Trees

Understanding decision trees

ID3

C4.5

C5.0

Classification and regression trees and random forest

Conditional inference trees and forests

Installing the packages containing the required functions

Performing the analyses in R

Caret – a unified framework for classification

Summary

12. Multilevel Analyses

Nested data

Multilevel regression

Multilevel modeling in R

Predictions using multilevel models

Summary

13. Text Analytics with R

An introduction to text analytics

Loading the corpus

Data preparation

Creating the training and testing data frames

Classification of the reviews

Mining the news with R

Summary

14. Cross-validation and Bootstrapping Using Caret and Exporting Predictive Models Using PMML

Cross-validation and bootstrapping of predictive models using the caret package

Exporting models using PMML

Summary

A. Exercises and Solutions

Exercises

Solutions

B. Further Reading and References

Preface

Chapter 1 – Setting GNU R for Predictive Modeling

Chapter 2 – Visualizing and Manipulating Data Using R

Chapter 3 – Data Visualization with Lattice

Chapter 4 – Cluster Analysis

Chapter 5 – Agglomerative Clustering Using hclust()

Chapter 6 – Dimensionality Reduction with Principal Component Analysis

Chapter 7 – Exploring Association Rules with Apriori

Chapter 8 – Probability Distributions, Covariance, and Correlation

Chapter 9 – Linear Regression

Chapter 10 – Classification with k-Nearest Neighbors and Naïve Bayes

Chapter 11 – Classification Trees

Chapter 12 – Multilevel Analyses

Chapter 13 – Text Analytics with R

Chapter 14 – Cross-validation and Bootstrapping Using Caret and Exporting Predictive Models Using PMML

Index

Learning Predictive Analytics with R

By : Eric Mayor

Learning Predictive Analytics with R

By: Eric Mayor

Overview of this book

Chapter 4. Cluster Analysis

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access