Statistical Application Development with R and Python

Statistical Application Development with R and Python - Second Edition

Overview of this book

Statistical Analysis involves collecting and examining data to describe the nature of data that needs to be analyzed. It helps you explore the relation of data and build models to make better decisions. This book explores statistical concepts along with R and Python, which are well integrated from the word go. Almost every concept has an R code going with it which exemplifies the strength of R and applications. The R code and programs have been further strengthened with equivalent Python programs. Thus, you will first understand the data characteristics, descriptive statistics and the exploratory attitude, which will give you firm footing of data analysis. Statistical inference will complete the technical footing of statistical methods. Regression, linear, logistic modeling, and CART, builds the essential toolkit. This will help you complete complex problems in the real world. You will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code, and further enhanced by Python. The data analysis journey begins with exploratory analysis, which is more than simple, descriptive, data summaries. You will then apply linear regression modeling, and end with logistic regression, CART, and spatial statistics. By the end of this book you will be able to apply your statistical learning in major domains at work or in your projects.

Statistical Application Development with R and Python - Second Edition

Credits

About the Author

Acknowledgment

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

Data Characteristics

Questionnaire and its components

Experiments with uncertainty in computer science

Installing and setting up R

Using R packages

Python installation and setup

IDEs for R and Python

The companion code bundle

Discrete distributions

Continuous distributions

Summary

Import/Export Data

Packages and settings – R and Python

Understanding data.frame and other formats

Using utils and the foreign packages

Exporting data/graphs

Pop quiz

Summary

Data Visualization

Packages and settings – R and Python

Visualization techniques for categorical data

Visualization techniques for continuous variable data

Pareto chart

A brief peek at ggplot2

Summary

Exploratory Analysis

Packages and settings – R and Python

Essential summary statistics

Techniques for exploratory analysis

Summary

Statistical Inference

Packages and settings – R and Python

Maximum likelihood estimator

Confidence intervals

Hypothesis testing

Summary

Linear Regression Analysis

Packages and settings - R and Python

The essence of regression

The simple linear regression model

Multiple linear regression model

Regression diagnostics

Model selection

Summary

Logistic Regression Model

Packages and settings – R and Python

Model validation and diagnostics

Logistic regression for the German credit screening dataset

Summary

Regression Models with Regularization

Packages and settings – R and Python

Regression spline

Ridge regression for linear models

Summary

Classification and Regression Trees

Packages and settings – R and Python

Splitting the data

Summary

CART and Beyond

Packages and settings – R and Python

Understanding bagging

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Model selection

The method of removal of covariates in the The multicollinearity problem section depended solely on the covariates themselves. However, it may happen more often that the covariates in the final model are selected with respect to the output. Computational cost is almost a non-issue these days and especially for not-so-large datasets! The question that arises then is, can one retain all possible covariates in the model, or do we have any choice of covariates that meet certain regression metrics, say R 2 > 60 percent?

The problem is that having more covariates increases the variance of the model, while having less of them will have a large bias. The philosophical Occam's Razor principle applies here too, and the best model is the simplest model. In our context, the smallest model that fits the data is the best. There are two types of model selection: stepwise procedures and criterion-based procedures. In this section, we will consider both the procedures.

Statistical Application Development with R and Python - Second Edition

Statistical Application Development with R and Python - Second Edition

Overview of this book

Related Content you might be interested in

Current Title:

Statistical Application Development with R and Python - Second Edition

Hands-On Ensemble Learning with R

Regression Analysis with R

Practical Data Science Cookbook, Second Edition

Model selection

Stepwise procedures...