Statistical Application Development with R and Python

Statistical Application Development with R and Python - Second Edition

Overview of this book

Statistical Analysis involves collecting and examining data to describe the nature of data that needs to be analyzed. It helps you explore the relation of data and build models to make better decisions. This book explores statistical concepts along with R and Python, which are well integrated from the word go. Almost every concept has an R code going with it which exemplifies the strength of R and applications. The R code and programs have been further strengthened with equivalent Python programs. Thus, you will first understand the data characteristics, descriptive statistics and the exploratory attitude, which will give you firm footing of data analysis. Statistical inference will complete the technical footing of statistical methods. Regression, linear, logistic modeling, and CART, builds the essential toolkit. This will help you complete complex problems in the real world. You will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code, and further enhanced by Python. The data analysis journey begins with exploratory analysis, which is more than simple, descriptive, data summaries. You will then apply linear regression modeling, and end with logistic regression, CART, and spatial statistics. By the end of this book you will be able to apply your statistical learning in major domains at work or in your projects.

Statistical Application Development with R and Python - Second Edition

Credits

About the Author

Acknowledgment

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

Data Characteristics

Questionnaire and its components

Experiments with uncertainty in computer science

Installing and setting up R

Using R packages

Python installation and setup

IDEs for R and Python

The companion code bundle

Discrete distributions

Continuous distributions

Summary

Import/Export Data

Packages and settings – R and Python

Understanding data.frame and other formats

Using utils and the foreign packages

Exporting data/graphs

Pop quiz

Summary

Data Visualization

Packages and settings – R and Python

Visualization techniques for categorical data

Visualization techniques for continuous variable data

Pareto chart

A brief peek at ggplot2

Summary

Exploratory Analysis

Packages and settings – R and Python

Essential summary statistics

Techniques for exploratory analysis

Summary

Statistical Inference

Packages and settings – R and Python

Maximum likelihood estimator

Confidence intervals

Hypothesis testing

Summary

Linear Regression Analysis

Packages and settings - R and Python

The essence of regression

The simple linear regression model

Multiple linear regression model

Regression diagnostics

Model selection

Summary

Logistic Regression Model

Packages and settings – R and Python

Model validation and diagnostics

Logistic regression for the German credit screening dataset

Summary

Regression Models with Regularization

Packages and settings – R and Python

Regression spline

Ridge regression for linear models

Summary

Classification and Regression Trees

Packages and settings – R and Python

Splitting the data

Summary

CART and Beyond

Packages and settings – R and Python

Understanding bagging

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

The companion code bundle

After the user downloads the code bundle, RPySADBE.zip, from the publisher’s website, the first task is to unzip it to a local machine. We encourage the reader to download the code bundle since the R and Python code in the ebook might be in image format and it is a futile exercise to key in long programs all over again.

The folder structure in the unzipped format will consist of two folders: R and Python. Each of these chapters further consists of 10 sub-folders, one folder for each chapter. R software has a special package for itself as RSADBE available on CRAN. Thus, it does not have a Data sub-folder with the exception of Chapter 2, Import/Export Data. The chapter level folders for R will contain two sub-folders: Output and SRC. The SRC folder contains a file named Chapter_Number.R, which consists of all code used in the package. The Output folder contains a Microsoft Word document named Chapter_Number.doc. The reader is given an exercise to set up the Markdown settings; search for it on the web. The Chapter_Number.doc is the result of running the R file Chapter_Number.R. The graphics in the Markdown files will be different from the ones observed in the book.

Python’s chapter sub-folders are of three types: Data, Output, SRC. The required Comma Separated Values (CSV) data files are available in the Data folder while the SRC folder consists of the Python code file, Chapter_Number.py. The output file as a consequence of running the Python file in the IDE is saved as a Chapter_Number_Title.ipynb file. In many cases, the graphics generated by either R or Python for the same purpose yields the same display.

Since the R software has been run first and the explanation with the interpretation given following it, we have given the corresponding Python program, which is different; the graphical output is not necessarily produced in the book. In such cases, the ipynb files would come in handy as they contain all the graphics. Markdown is available for Python too, but we don’t pursue it though.

Here’s a final word about executing the R and Python files. The author does not have access about the path of the unzipped folder. Thus, the reader needs to specify the path appropriately in the R and Python files. Most likely, the reader would have to replace MyPath by /home/user/RPySADBE or C:/User/Documents/RPySADBE.

We will now begin formal discussion of the essential probability distributions.

Statistical Application Development with R and Python - Second Edition

Statistical Application Development with R and Python - Second Edition

Overview of this book

Related Content you might be interested in

Current Title:

Statistical Application Development with R and Python - Second Edition

Hands-On Ensemble Learning with R

Regression Analysis with R

Practical Data Science Cookbook, Second Edition

The companion code bundle