R for Data Science Cookbook (n)

R for Data Science Cookbook (n)

By : Yu-Wei, Chiu (David Chiu)

Buy this Book

R for Data Science Cookbook (n)

By: Yu-Wei, Chiu (David Chiu)

Buy this Book

Overview of this book

This cookbook offers a range of data analysis samples in simple and straightforward R code, providing step-by-step resources and time-saving methods to help you solve data problems efficiently. The first section deals with how to create R functions to avoid the unnecessary duplication of code. You will learn how to prepare, process, and perform sophisticated ETL for heterogeneous data sources with R packages. An example of data manipulation is provided, illustrating how to use the “dplyr” and “data.table” packages to efficiently process larger data structures. We also focus on “ggplot2” and show you how to create advanced figures for data exploration. In addition, you will learn how to build an interactive report using the “ggvis” package. Later chapters offer insight into time series analysis on financial data, while there is detailed information on the hot topic of machine learning, including data classification, regression, clustering, association rule mining, and dimension reduction. By the end of this book, you will understand how to resolve issues and will be able to comfortably offer solutions to problems encountered while performing data analysis.

R for Data Science Cookbook

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Functions in R

Introduction

Creating R functions

Matching arguments

Understanding environments

Working with lexical scoping

Understanding closure

Performing lazy evaluation

Creating infix operators

Using the replacement function

Handling errors in a function

The debugging function

Data Extracting, Transforming, and Loading

Introduction

Downloading open data

Reading and writing CSV files

Scanning text files

Working with Excel files

Reading data from databases

Scraping web data

Accessing Facebook data

Working with twitteR

Data Preprocessing and Preparation

Introduction

Renaming the data variable

Converting data types

Working with the date format

Detecting missing data

Imputing missing data

Data Manipulation

Introduction

Enhancing a data.frame with a data.table

Managing data with a data.table

Performing fast aggregation with a data.table

Merging large datasets with a data.table

Subsetting and slicing data with dplyr

Sampling data with dplyr

Selecting columns with dplyr

Chaining operations in dplyr

Arranging rows with dplyr

Eliminating duplicated rows with dplyr

Adding new columns with dplyr

Summarizing data with dplyr

Merging data with dplyr

Visualizing Data with ggplot2

Introduction

Creating basic plots with ggplot2

Changing aesthetics mapping

Introducing geometric objects

Performing transformations

Making Interactive Reports

Introduction

Creating R Markdown reports

Learning the markdown syntax

Embedding R code chunks

Creating interactive graphics with ggvis

Understanding basic syntax and grammar

Controlling axes and legends

Using scales

Adding interactivity to a ggvis plot

Creating an R Shiny document

Publishing an R Shiny report

Simulation from Probability Distributions

Introduction

Generating random samples

Understanding uniform distributions

Generating binomial random variates

Generating Poisson random variates

Sampling from a normal distribution

Sampling from a chi-squared distribution

Understanding Student's t-distribution

Sampling from a dataset

Simulating the stochastic process

Statistical Inference in R

Introduction

Getting confidence intervals

Performing Z-tests

Performing student's T-tests

Conducting exact binomial tests

Performing Kolmogorov-Smirnov tests

Working with the Pearson's chi-squared tests

Understanding the Wilcoxon Rank Sum and Signed Rank tests

Conducting one-way ANOVA

Performing two-way ANOVA

Rule and Pattern Mining with R

Introduction

Transforming data into transactions

Displaying transactions and associations

Mining associations with the Apriori rule

Pruning redundant rules

Visualizing association rules

Mining frequent itemsets with Eclat

Creating transactions with temporal information

Mining frequent sequential patterns with cSPADE

Time Series Mining with R

Introduction

Creating time series data

Plotting a time series object

Decomposing time series

Smoothing time series

Forecasting time series

Selecting an ARIMA model

Creating an ARIMA model

Forecasting with an ARIMA model

Predicting stock prices with an ARIMA model

Supervised Machine Learning

Introduction

Fitting a linear regression model with lm

Summarizing linear model fits

Using linear regression to predict unknown values

Measuring the performance of the regression model

Performing a multiple regression analysis

Selecting the best-fitted regression model with stepwise regression

Applying the Gaussian model for generalized linear regression

Performing a logistic regression analysis

Building a classification model with recursive partitioning trees

Visualizing a recursive partitioning tree

Measuring model performance with a confusion matrix

Measuring prediction performance using ROCR

Unsupervised Machine Learning

Introduction

Clustering data with hierarchical clustering

Cutting tree into clusters

Clustering data with the k-means method

Clustering data with the density-based method

Extracting silhouette information from clustering

Comparing clustering methods

Recognizing digits using the density-based clustering method

Performing dimension reduction with Principal Component Analysis (PCA)

Determining the number of principal components using a scree plot

Determining the number of principal components using the Kaiser method

Visualizing multivariate data using a biplot

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Grouping similar text documents with k-means clustering methods

Computer programs face limitations in interpreting the meaning of given sentences, and therefore do not know how to group documents based on their similarities. However, if we can convert sentences into a mathematical matrix (document term matrix), a program can compute the distance between each document and group similar ones together.

In this recipe, we demonstrate how to compute the distance between text documents and how we can cluster similar text documents with the k-means method.

Getting ready

In this recipe, we use news titles as clustering input. You can find the data on the author's GitHub page at https://github.com/ywchiu/rcookbook/raw/master/chapter12/news.RData.

How to do it…

Perform the following steps to cluster text document with k-means clustering techniques:

First, install and load the tm and SnowballC packages:

> install.packages('tm')
> library(tm)
> install.packages('SnowballC')
> library(SnowballC...

R for Data Science Cookbook (n)

By : Yu-Wei, Chiu (David Chiu)

R for Data Science Cookbook (n)

By: Yu-Wei, Chiu (David Chiu)

Overview of this book

Related Content you might be interested in

Current Title:

R for Data Science Cookbook (n)

Grouping similar text documents with k-means clustering methods

Getting ready

How to do it…