8. Statistical Inference in R | R for Data Science Cookbook

Book Overview & Buying
Table Of Contents

R for Data Science Cookbook

By : Yu-Wei, Chiu (David Chiu), Prabhanjan Narayanachar Tattar

4.3 (3)

Buy this Book

R for Data Science Cookbook

4.3 (3)

By: Yu-Wei, Chiu (David Chiu), Prabhanjan Narayanachar Tattar

Buy this Book

Overview of this book

This cookbook offers a range of data analysis samples in simple and straightforward R code, providing step-by-step resources and time-saving methods to help you solve data problems efficiently. The first section deals with how to create R functions to avoid the unnecessary duplication of code. You will learn how to prepare, process, and perform sophisticated ETL for heterogeneous data sources with R packages. An example of data manipulation is provided, illustrating how to use the “dplyr” and “data.table” packages to efficiently process larger data structures. We also focus on “ggplot2” and show you how to create advanced figures for data exploration. In addition, you will learn how to build an interactive report using the “ggvis” package. Later chapters offer insight into time series analysis on financial data, while there is detailed information on the hot topic of machine learning, including data classification, regression, clustering, association rule mining, and dimension reduction. By the end of this book, you will understand how to resolve issues and will be able to comfortably offer solutions to problems encountered while performing data analysis.

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Conventions

Reader feedback

Customer support

Free Chapter

1. Functions in R

Introduction

Creating R functions

Matching arguments

Understanding environments

Working with lexical scoping

Understanding closure

Performing lazy evaluation

Creating infix operators

Using the replacement function

Handling errors in a function

The debugging function

2. Data Extracting, Transforming, and Loading

Introduction

Downloading open data

Reading and writing CSV files

Scanning text files

Working with Excel files

Reading data from databases

Scraping web data

Accessing Facebook data

Working with twitteR

3. Data Preprocessing and Preparation

Introduction

Renaming the data variable

Converting data types

Working with the date format

Adding new records

Filtering data

Dropping data

Merging data

Sorting data

Reshaping data

Detecting missing data

Imputing missing data

4. Data Manipulation

Introduction

Enhancing a data.frame with a data.table

Managing data with a data.table

Performing fast aggregation with a data.table

Merging large datasets with a data.table

Subsetting and slicing data with dplyr

Sampling data with dplyr

Selecting columns with dplyr

Chaining operations in dplyr

Arranging rows with dplyr

Eliminating duplicated rows with dplyr

Adding new columns with dplyr

Summarizing data with dplyr

Merging data with dplyr

5. Visualizing Data with ggplot2

Introduction

Creating basic plots with ggplot2

Changing aesthetics mapping

Introducing geometric objects

Performing transformations

Adjusting scales

Faceting

Adjusting themes

Combining plots

Creating maps

6. Making Interactive Reports

Introduction

Creating R Markdown reports

Learning the markdown syntax

Embedding R code chunks

Creating interactive graphics with ggvis

Understanding basic syntax and grammar

Controlling axes and legends

Using scales

Adding interactivity to a ggvis plot

Creating an R Shiny document

Publishing an R Shiny report

7. Simulation from Probability Distributions

Introduction

Generating random samples

Understanding uniform distributions

Generating binomial random variates

Generating Poisson random variates

Sampling from a normal distribution

Sampling from a chi-squared distribution

Understanding Student's t-distribution

Sampling from a dataset

Simulating the stochastic process

8. Statistical Inference in R

Introduction

Getting confidence intervals

Performing Z-tests

Performing student's T-tests

Conducting exact binomial tests

Performing Kolmogorov-Smirnov tests

Working with the Pearson's chi-squared tests

Understanding the Wilcoxon Rank Sum and Signed Rank tests

Conducting one-way ANOVA

Performing two-way ANOVA

9. Rule and Pattern Mining with R

Introduction

Transforming data into transactions

Displaying transactions and associations

Mining associations with the Apriori rule

Pruning redundant rules

Visualizing association rules

Mining frequent itemsets with Eclat

Creating transactions with temporal information

Mining frequent sequential patterns with cSPADE

10. Time Series Mining with R

Introduction

Creating time series data

Plotting a time series object

Decomposing time series

Smoothing time series

Forecasting time series

Selecting an ARIMA model

Creating an ARIMA model

Forecasting with an ARIMA model

Predicting stock prices with an ARIMA model

11. Supervised Machine Learning

Introduction

Fitting a linear regression model with lm

Summarizing linear model fits

Using linear regression to predict unknown values

Measuring the performance of the regression model

Performing a multiple regression analysis

Selecting the best-fitted regression model with stepwise regression

Applying the Gaussian model for generalized linear regression

Performing a logistic regression analysis

Building a classification model with recursive partitioning trees

Visualizing a recursive partitioning tree

Measuring model performance with a confusion matrix

Measuring prediction performance using ROCR

12. Unsupervised Machine Learning

Introduction

Clustering data with hierarchical clustering

Cutting tree into clusters

Clustering data with the k-means method

Clustering data with the density-based method

Extracting silhouette information from clustering

Comparing clustering methods

Recognizing digits using the density-based clustering method

Grouping similar text documents with k-means clustering methods

Performing dimension reduction with Principal Component Analysis (PCA)

Determining the number of principal components using a scree plot

Determining the number of principal components using the Kaiser method

Visualizing multivariate data using a biplot

Index

R for Data Science Cookbook

By : Yu-Wei, Chiu (David Chiu), Prabhanjan Narayanachar Tattar

R for Data Science Cookbook

By: Yu-Wei, Chiu (David Chiu), Prabhanjan Narayanachar Tattar

Overview of this book

Introduction

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access