By : Ralph Winters
By: Ralph Winters

Overview of this book

This is the go-to book for anyone interested in the steps needed to develop predictive analytics solutions with examples from the world of marketing, healthcare, and retail. We'll get started with a brief history of predictive analytics and learn about different roles and functions people play within a predictive analytics project. Then, we will learn about various ways of installing R along with their pros and cons, combined with a step-by-step installation of RStudio, and a description of the best practices for organizing your projects. On completing the installation, we will begin to acquire the skills necessary to input, clean, and prepare your data for modeling. We will learn the six specific steps needed to implement and successfully deploy a predictive model starting from asking the right questions through model development and ending with deploying your predictive model into production. We will learn why collaboration is important and how agile iterative modeling cycles can increase your chances of developing and deploying the best successful model. We will continue your journey in the cloud by extending your skill set by learning about Databricks and SparkR, which allow you to develop predictive models on vast gigabytes of data.
Table of Contents (19 chapters)
Title Page
About the Author
About the Reviewers
Customer Feedback

Step 3 data preparation

As was mentioned in Chapter 1, Getting Started with Predictive Analysis, one purpose of data preparation is preparing an input data modeling file, which can go directly into an algorithm. In theory, the input file will encompass all of the knowledge gained in steps 1 and 2. Ideally, this file will consist of a target variable, all meaningful predictor variables and other identification variables to aid in the modeling process, and any additional variables which would have been created based on the raw data sources. Data preparation, such as the previous steps outlined is an iterative process. Here are some typical steps you might follow when preparing the data:

  • Identifying the data sources: These are the critical data inputs that you will need to read in and manipulate. They can be sourced from various data formats such as CSV files, databases, or XML or JSON files. They can be in structured format or unstructured format.
  • Identify the expected input: Read in some test...