Book Image

Regression Analysis with R

By : Giuseppe Ciaburro
Book Image

Regression Analysis with R

By: Giuseppe Ciaburro

Overview of this book

Regression analysis is a statistical process which enables prediction of relationships between variables. The predictions are based on the casual effect of one variable upon another. Regression techniques for modeling and analyzing are employed on large set of data in order to reveal hidden relationship among the variables. This book will give you a rundown explaining what regression analysis is, explaining you the process from scratch. The first few chapters give an understanding of what the different types of learning are – supervised and unsupervised, how these learnings differ from each other. We then move to covering the supervised learning in details covering the various aspects of regression analysis. The outline of chapters are arranged in a way that gives a feel of all the steps covered in a data science process – loading the training dataset, handling missing values, EDA on the dataset, transformations and feature engineering, model building, assessing the model fitting and performance, and finally making predictions on unseen datasets. Each chapter starts with explaining the theoretical concepts and once the reader gets comfortable with the theory, we move to the practical examples to support the understanding. The practical examples are illustrated using R code including the different packages in R such as R Stats, Caret and so on. Each chapter is a mix of theory and practical examples. By the end of this book you will know all the concepts and pain-points related to regression analysis, and you will be able to implement your learning in your projects.
Table of Contents (15 chapters)
Title Page
Packt Upsell
Contributors
Preface
Index

Installing R


After that detailed description of the programming environment R, it is time to install it on our machine. To do this, we will have to get the installation package first.

Note

The packages we will need to install are available on the official website of the language, Comprehensive R Archive Network (CRAN), at the following URL: https://www.r-project.org/.

CRAN is a network of File Transfer Protocol (FTP) and web servers located around the world that stores identical source and documentation versions of R. CRAN is directly accessible from R's site, and on this site you can also find information about R, some technical manuals, the R magazine, and details about R-developed packages that are stored in CRAN repositories.

Of course, before you download the software versions, we will have to inform you of the type of machine you need and the operating system that must be installed on it. Remember, however, that R is practically available for all operating systems in circulation. In the following screenshot, the CRAN web page is shown:

In the drafting period of this book, the current version of the environment R is 3.4.1, which represents the stable one, and that is why, in the examples that will accompany us in the subsequent sections, we will refer to that version.

The following list shows the OSs supported:

  • Windows
  • macOS
  • Unix

In computer science, installation is the procedure whereby the software is copied and configured on the machine. Generally, the software is distributed as a compressed file package, which includes an interface that facilitates and automates the installation (installer).

The installation creates folders on the disk, where all the files used for the program configuration are contained, and the links to make it easier to execute and write the necessary configuration parameters. In the following screenshot, we can see CRAN with all the tools needed for proper software installation:

There are essentially two ways to install R:

  • Using existing distributions in the form of binaries
  • Using source code

Using precompiled binary distribution

Binary distribution is the simplest choice; it works on most machines and will be the one we will use to make the job as simple as possible. This is a compiled version of R which can be downloaded and installed directly on our system.

Installing on Windows

For the Windows operating system, this version looks like a single EXE file (downloadable from the CRAN site), which can be easily installed with a double-click on it and by following the few steps of the installation. These are the automated installation procedures, the so-called installers, through which the installation phase of the software is reduced by the user to the need to have clicked on the buttons a number of times. Once the process is completed, you can start using R via the icon that will appear on the desktop or through the link available in the list of programs that can be used in our system.

Installing on macOS

Similarly, for macOS, R is available with a unique installation file with a PKG extension; it can be downloaded and installed on our system. The following screenshot shows the directory containing binaries for a base distribution and packages to run on macOS X (release 10.6 and later) extracted from the CRAN website:

Installing on Linux

For a Linux system, there are several versions of the installation file. In the download section, you must select the appropriate version of R, according to the Linux distribution installed on your machine. Installation packages are available in two main formats, .rpm file for Fedora, SUSE, and Mandriva, and .deb extensions for Ubuntu, Debian, and Linux Mint.

Installation from source code

R's installation from source code is available for all supported platforms, though it is not as easy to perform compared to the binary distribution we've just seen. It is especially hard on Windows, since the installation tools are not part of the system.

Note

Detailed information on installation procedures from source code for Windows, and necessary tools, are available on the CRAN website, at https://cran.r-project.org/doc/manuals/r-release/R-admin.html.

On Unix-like systems, the process, on the other hand, is much simpler; the installation must be done following the usual procedure, which uses the following commands:

./configure
make
make install

These commands, assuming that compilers and support libraries are available, lead to the proper installation of the R environment on our system.