Book Image

Learning Predictive Analytics with R

By : Eric Mayor
Book Image

Learning Predictive Analytics with R

By: Eric Mayor

Overview of this book

This book is packed with easy-to-follow guidelines that explain the workings of the many key data mining tools of R, which are used to discover knowledge from your data. You will learn how to perform key predictive analytics tasks using R, such as train and test predictive models for classification and regression tasks, score new data sets and so on. All chapters will guide you in acquiring the skills in a practical way. Most chapters also include a theoretical introduction that will sharpen your understanding of the subject matter and invite you to go further. The book familiarizes you with the most common data mining tools of R, such as k-means, hierarchical regression, linear regression, association rules, principal component analysis, multilevel modeling, k-NN, Naïve Bayes, decision trees, and text mining. It also provides a description of visualization techniques using the basic visualization tools of R as well as lattice for visualizing patterns in data organized in groups. This book is invaluable for anyone fascinated by the data mining opportunities offered by GNU R and its packages.
Table of Contents (23 chapters)
Learning Predictive Analytics with R
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Exercises and Solutions
Index

Chapter 1. Setting GNU R for Predictive Analytics

R is a relatively recent multi-purpose statistical language that originates from the older language S. R contains a core set of packages that includes some of the most common statistical tests and some data mining algorithms. One of the most important strengths of R is the degree to which its functionalities can be extended by installing packages made by users from the community. These packages can be installed directly from R, thereby making the process very comfortable. The Comprehensive R Archive Network (CRAN), which is available at http://cran.r-project.org, is a repository of packages, R sources, and R binaries (installers). It also contains the manuals for the packages. There are currently more than 4,500 available packages for R, and more are coming up regularly. Further, what is also great is that everything is free.

The topics covered in this chapter are:

  • Installation of R

  • R graphic user interface, including a description of the different menus

  • Definition of packages and how to install and load them

  • Along the way we will also discover parts of the syntax of R

Among almost 50 competitors, R is the most widely used tool for predictive modeling, together with RapidMiner, according to yearly software polls from KDnuggets (most recently available at http://www.kdnuggets.com/2015/05/poll-r-rapidminer-python-big-data-spark.html). Its broad use and the extent to which it is extendable make it an essential software package for data scientists. Competitors notably include Python, Weka, and Knime.

This book is intended for people who are familiar with R. This doesn't mean that people who do not have such a background cannot learn predictive analytics by using this book. It just means that they will require more time to use this book effectively, and might need to consult the basic R documentation along the way. With this extended readership in mind, we will just cover a few of the basics in this chapter while we set up R for predictive analytics. The writing style will be as accessible as possible. If you have trouble following through the first chapter, we suggest you first read a book on R basics before pursuing the following chapters, because the effort you will need to invest to understand and practice the content of this book will keep increasing from Chapter 2, Visualizing and Manipulating Data Using R. Unlike other chapters, this chapter explains basic information. Users who are more familiar with R are invited to skip to Chapter 2, Visualizing and Manipulating Data Using R or Chapter 3, Data Visualization with Lattice.