Book Image

Statistical Application Development with R and Python - Second Edition

Book Image

Statistical Application Development with R and Python - Second Edition

Overview of this book

Statistical Analysis involves collecting and examining data to describe the nature of data that needs to be analyzed. It helps you explore the relation of data and build models to make better decisions. This book explores statistical concepts along with R and Python, which are well integrated from the word go. Almost every concept has an R code going with it which exemplifies the strength of R and applications. The R code and programs have been further strengthened with equivalent Python programs. Thus, you will first understand the data characteristics, descriptive statistics and the exploratory attitude, which will give you firm footing of data analysis. Statistical inference will complete the technical footing of statistical methods. Regression, linear, logistic modeling, and CART, builds the essential toolkit. This will help you complete complex problems in the real world. You will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code, and further enhanced by Python. The data analysis journey begins with exploratory analysis, which is more than simple, descriptive, data summaries. You will then apply linear regression modeling, and end with logistic regression, CART, and spatial statistics. By the end of this book you will be able to apply your statistical learning in major domains at work or in your projects.
Table of Contents (19 chapters)
Statistical Application Development with R and Python - Second Edition
Credits
About the Author
Acknowledgment
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
Index

The companion code bundle


After the user downloads the code bundle, RPySADBE.zip, from the publisher’s website, the first task is to unzip it to a local machine. We encourage the reader to download the code bundle since the R and Python code in the ebook might be in image format and it is a futile exercise to key in long programs all over again.

The folder structure in the unzipped format will consist of two folders: R and Python. Each of these chapters further consists of 10 sub-folders, one folder for each chapter. R software has a special package for itself as RSADBE available on CRAN. Thus, it does not have a Data sub-folder with the exception of Chapter 2, Import/Export Data. The chapter level folders for R will contain two sub-folders: Output and SRC. The SRC folder contains a file named Chapter_Number.R, which consists of all code used in the package. The Output folder contains a Microsoft Word document named Chapter_Number.doc. The reader is given an exercise to set up the Markdown settings; search for it on the web. The Chapter_Number.doc is the result of running the R file Chapter_Number.R. The graphics in the Markdown files will be different from the ones observed in the book.

Python’s chapter sub-folders are of three types: Data, Output, SRC. The required Comma Separated Values (CSV) data files are available in the Data folder while the SRC folder consists of the Python code file, Chapter_Number.py. The output file as a consequence of running the Python file in the IDE is saved as a Chapter_Number_Title.ipynb file. In many cases, the graphics generated by either R or Python for the same purpose yields the same display.

Since the R software has been run first and the explanation with the interpretation given following it, we have given the corresponding Python program, which is different; the graphical output is not necessarily produced in the book. In such cases, the ipynb files would come in handy as they contain all the graphics. Markdown is available for Python too, but we don’t pursue it though.

Here’s a final word about executing the R and Python files. The author does not have access about the path of the unzipped folder. Thus, the reader needs to specify the path appropriately in the R and Python files. Most likely, the reader would have to replace MyPath by /home/user/RPySADBE or C:/User/Documents/RPySADBE.

We will now begin formal discussion of the essential probability distributions.