Book Image

Practical Predictive Analytics

By : Ralph Winters
Book Image

Practical Predictive Analytics

By: Ralph Winters

Overview of this book

This is the go-to book for anyone interested in the steps needed to develop predictive analytics solutions with examples from the world of marketing, healthcare, and retail. We'll get started with a brief history of predictive analytics and learn about different roles and functions people play within a predictive analytics project. Then, we will learn about various ways of installing R along with their pros and cons, combined with a step-by-step installation of RStudio, and a description of the best practices for organizing your projects. On completing the installation, we will begin to acquire the skills necessary to input, clean, and prepare your data for modeling. We will learn the six specific steps needed to implement and successfully deploy a predictive model starting from asking the right questions through model development and ending with deploying your predictive model into production. We will learn why collaboration is important and how agile iterative modeling cycles can increase your chances of developing and deploying the best successful model. We will continue your journey in the cloud by extending your skill set by learning about Databricks and SparkR, which allow you to develop predictive models on vast gigabytes of data.
Table of Contents (19 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface

Getting started with RStudio


After R installation has completed, point your browser to the download section found through the RStudio web site (https://www.rstudio.com/) and install the RStudio executable appropriate for your operating system:

  • Click the RStudio icon to bring up the program.
  • The program initially starts with three tiled window panes, as shown in the following screenshot. If the layout does not correspond exactly to what is shown, the next section will show you how to rearrange the layout to correspond with the examples shown in this chapter:

Rearranging the layout to correspond with the examples

To rearrange the layout, see the following steps:

  1. Select Tools | Global Options | Pane Layout from the top navigation bar.
  2. Select the drop-down arrow in each of the four quadrants, and change the title of each pane to what is shown in the following diagram.
    • Make sure that Environment | History | Files | Plots and Help are selected for the upper left pane
    • Make sure that Viewer is selected for the bottom left pane.
    • Select Console for the bottom right pane
    • Select Source for the upper right pane
  3. Click OK.

After the changes are applied the layout should more closely match the layout previously shown . However, it may not match exactly. A lot will depend upon the version of RStudio that you are using as well as the packages you may have already installed.

Brief description of some important panes

  • The Source pane will be used to code and save your programs. Once code is created you can use File | Save to save your work to an external file, and File |Open to retrieve the saved code.

Note

If you are installing RStudio for the first time nothing may be shown as the fourth pane. However, as you create new programs (as we will later in this chapter), it will appear in the upper right quadrant.

  • The Console pane provides important feedback and information about your program after it has been run. It will show you any syntax or error messages that have occurred. It is always a best practice to examine the console to make sure you are getting the results you expect, and make sure the console is clear of errors. The console is also the place that you will see a lots of output which has been created from your programs.
  • We will rely heavily on the View pane. This pane displays formatted output which is run by using the R View command.
  • The Environment | History | Plots pane is sort of a catch-all pane which changes functions depending upon what which tabs have been selected via the pane layout dialogue. For example, all plots issued by R command are displayed under the Plots tab. Help is always a click away by selecting the Help tab. There is also a useful tab called Packages which will automatically load a package, when a particular package is checked.

Creating a new project

Once you are set with your layout, proceed to create a new project by following these steps:

Create a new project by following these steps:

  1. Identify the menu bar, above the icons at the top left of the screen.
  2. Click File and then New Project
  3. At the next screen select Existing Directory:
  1. The following screen will appear:
  1. The Project working directory is initial populated with a tilde (~). This means that the project will be created in the directory you are currently in.
  2. To specify the directory first select Browse, and then navigate to the PracticalPredictiveAnalytics folder you created in the previous steps.
  3. When the Choose Directory dialog box appear, select this directory using the Select Folder button.
  4. After selecting the directory, the following should appear (Windows only):
  1. To finalize creating the project, Select the Create Project button. Rstudio will then switch to the new project you have just created.

All screen panes will then appear as blank (except for the log), and the title bar at the top left of the screen will show the path to the project.

To verify that the R, outputs, and data directories are contained within the project, select File, and then File Open from the top menu bar. The three folders should appear, as indicated as follows:

Once you have verified this, cancel the Open File dialogue, and return to RStudio main screen.