R Statistical Application Development by Example Beginner's Guide

R Statistical Application Development by Example Beginner's Guide

By : Prabhanjan Narayanachar Tattar

Buy this Book

R Statistical Application Development by Example Beginner's Guide

By: Prabhanjan Narayanachar Tattar

Buy this Book

Overview of this book

"R Statistical Application Development by Example Beginner’s Guide" explores statistical concepts and the R software, which are well integrated from the word go. This demarcates the separate learning of theory and applications and hence the title begins with “R Statistical …”. Almost every concept has an R code going with it which exemplifies the strength of R and applications. Thus, the reader first understands the data characteristics, descriptive statistics, and the exploratory attitude which gives the first firm footing of data analysis. Statistical inference and the use of simulation which makes use of the computational power complete the technical footing of statistical methods. Regression modeling, linear, logistic, and CART, builds the essential toolkit which helps the reader complete complex problems in the real world. The reader will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code. The data analysis journey begins with exploratory analysis, which is more than simple descriptive data summaries, and then takes the traditional path up to linear regression modeling, and ends with logistic regression, CART, and spatial statistics. True to the title R Statistical Application Development by Example Beginner’s Guide, the reader will enjoy the examples and R software.

R Statistical Application Development by Example Beginner's Guide

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Data Characteristics

Questionnaire and its components

Experiments with uncertainty in computer science

R installation

Continuous distribution

Summary

Import/Export Data

data.frame and other formats

Time for action – understanding constants, vectors, and basic arithmetic

Time for action – matrix computations

Time for action – creating a list object

Time for action – creating a data.frame object

Summary

Data Visualization

Visualization techniques for categorical data

Time for action – bar charts in R

Time for action – dot charts in R

Time for action – the spine plot for the shift and operator data

Time for action – the mosaic plot for the Titanic dataset

Visualization techniques for continuous variable data

Time for action – using the boxplot

Time for action – understanding the effectiveness of histograms

Time for action – plot and pairs R functions

A brief peek at ggplot2

Time for action – qplot

Time for action – ggplot

Summary

Exploratory Analysis

Essential summary statistics

Time for action – the essential summary statistics for "The Wall" dataset

The stem-and-leaf plot

Time for action – the stem function in play

Letter values

Data re-expression

Bagplot – a bivariate boxplot

Time for action – the bagplot display for a multivariate dataset

The resistant line

Time for action – the resistant line as a first regression model

Smoothing data

Time for action – smoothening the cow temperature data

Median polish

Time for action – the median polish algorithm

Summary

Statistical Inference

Maximum likelihood estimator

Time for action – visualizing the likelihood function

Time for action – finding the MLE using mle and fitdistr functions

Confidence intervals

Time for action – confidence intervals

Hypotheses testing

Time for action – testing the probability of success

Time for action – testing proportions

Time for action – testing one-sample hypotheses

Time for action – testing two-sample hypotheses

Summary

Linear Regression Analysis

The simple linear regression model

Time for action – the arbitrary choice of parameters

Time for action – building a simple linear regression model

Time for action – ANOVA and the confidence intervals

Time for action – residual plots for model validation

Multiple linear regression model

Time for action – averaging k simple linear regression models

Time for action – building a multiple linear regression model

Time for action – the ANOVA and confidence intervals for the multiple linear regression model

Time for action – residual plots for the multiple linear regression model

Regression diagnostics

The multicollinearity problem

Time for action – addressing the multicollinearity problem for the Gasoline data

Model selection

Time for action – model selection using the backward, forward, and AIC criteria

Summary

The Logistic Regression Model

The binary regression problem

Time for action – limitations of linear regression models

Probit regression model

Time for action – understanding the constants

Logistic regression model

Time for action – fitting the logistic regression model

Time for action – The Hosmer-Lemeshow goodness-of-fit statistic

Model validation and diagnostics

Time for action – residual plots for the logistic regression model

Time for action – diagnostics for the logistic regression

Receiving operator curves

Time for action – ROC construction

Logistic regression for the German credit screening dataset

Time for action – logistic regression for the German credit dataset

Summary

Regression Models with Regularization

The overfitting problem

Time for action – understanding overfitting

Regression spline

Time for action – fitting piecewise linear regression models

Time for action – fitting the spline regression models

Ridge regression for linear models

Time for action – ridge regression for the linear regression model

Ridge regression for logistic regression models

Time for action – ridge regression for the logistic regression model

Another look at model assessment

Time for action – selecting lambda iteratively and other topics

Summary

Classification and Regression Trees

Recursive partitions

Time for action – partitioning the display plot

Time for action – building our first tree

The construction of a regression tree

Time for action – the construction of a regression tree

The construction of a classification tree

Time for action – the construction of a classification tree

Classification tree for the German credit data

Time for action – the construction of a classification tree

Pruning and other finer aspects of a tree

Time for action – pruning a classification tree

Summary

CART and Beyond

Improving CART

Time for action – cross-validation predictions

Bagging

Time for action – understanding the bootstrap technique

Time for action – the bagging algorithm

Random forests

Time for action – random forests for the German credit data

The consolidation

Time for action – random forests for the low birth weight data

Summary

References

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Regression diagnostics

In the Useful residual plots subsection, we saw how outliers may be identified using the residual plots. If there are outliers, we need to ask the following questions:

Is the observation an outlier due to an anomalous value in one or more covariate values?
Is the observation an outlier due to an extreme output value?
Is the observation an outlier because of both the covariate and output values being extreme values?

The distinction in the nature of an outlier is vital as one needs to be sure of its type. The techniques for outlier identification are certainly different as is their impact. If the outlier is due to the covariate value, the observation is called a leverage point, and if it is due to the y value, we call it an influential point. The rest of the section is for the exact statistical technique for such an outlier identification.

Leverage points

As noted, a leverage point has an anomalous x value. The leverage points may be theoretically proved not to impact the...

R Statistical Application Development by Example Beginner's Guide

By : Prabhanjan Narayanachar Tattar

R Statistical Application Development by Example Beginner's Guide

By: Prabhanjan Narayanachar Tattar

Overview of this book

Related Content you might be interested in

Current Title:

R Statistical Application Development by Example Beginner's Guide

Regression diagnostics

Leverage points