Book Image

R Data Mining

Book Image

R Data Mining

Overview of this book

R is widely used to leverage data mining techniques across many different industries, including finance, medicine, scientific research, and more. This book will empower you to produce and present impressive analyses from data, by selecting and implementing the appropriate data mining techniques in R. It will let you gain these powerful skills while immersing in a one of a kind data mining crime case, where you will be requested to help resolving a real fraud case affecting a commercial company, by the mean of both basic and advanced data mining techniques. While moving along the plot of the story you will effectively learn and practice on real data the various R packages commonly employed for this kind of tasks. You will also get the chance of apply some of the most popular and effective data mining models and algos, from the basic multiple linear regression to the most advanced Support Vector Machines. Unlike other data mining learning instruments, this book will effectively expose you the theory behind these models, their relevant assumptions and when they can be applied to the data you are facing. By the end of the book you will hold a new and powerful toolbox of instruments, exactly knowing when and how to employ each of them to solve your data mining problems and get the most out of your data. Finally, to let you maximize the exposure to the concepts described and the learning process, the book comes packed with a reproducible bundle of commented R scripts and a practical set of data mining models cheat sheets.
Table of Contents (22 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
14
Epilogue

Possible alternatives to write and run R code


We have already discussed two ways of executing R code:

  • Employing your OS terminal
  • Employing the development environment that comes with the R base installation

The first of the aforementioned ways can be quite a convenient way for experienced R users. It clearly shows its advantages when executing articulated analytical activities, such as ones requiring:

  • The sequential execution of scripts from different languages
  • The execution of filesystem manipulation

Regarding the second alternative, we have already talked about its shortfalls compared to its direct competitor. Therefore, now is the time to have a closer look at this competitor, and this is what we are going to do in the following paragraphs before actually starting to write some more R code.

Two disclaimers are needed:

  • We are not considering text editor applications here, that is, software without an R console included and additional code execution utilities included. Rather, we prefer an integrated development environment, since they are able to provide a more user-friendly and comprehensive experience for a new language adopter. 
  • We are not looking for completeness here, just for the tools most often cited within R community discussions and events. Perhaps something better than these platforms is available, but it has not yet gained comparable momentum.

The alternative platforms we are going to introduce here are:

  • RStudio
  • Jupyter Notebook
  • Visual Studio

RStudio (all OSs)

RStudio is a really well-known IDE within the R community. It is freely available at https://www.rstudio.com. The main reason for its popularity is probably the R-dedicated nature of the platform, which differentiates it from the other two alternatives that we will discuss further, and its perfect integration with some of the most beloved packages of the R community.

RStudio comes packed with all the base features we talked about when discovering the R base installation development environment, enriched with a ton of useful additional components introduced to facilitate coding activity and maximize the effectiveness of the development process. Among those, we should point out:

  • A filesystem browser to explore and interact with the content of the directory you are working with
  • A file import wizard to facilitate the import of datasets
  • A plot pane to visualize and interact with the data visualization produced by code execution
  • An environment explorer to visualize and interact with values and the data produced by code execution
  • A spreadsheet-like data viewer to visualize the datasets produced by code execution

All of this is enhanced by features such as code autocompletion, inline help for functions, and splittable windows for multi-monitor users, as seen in the following screenshot:

A final word has to be said about integration with the most beloved R additional packages. RStudio comes with additional controls or predefined shortcuts to fully integrate, for instance:

  • markdown package for markdown integration with R code (more on this in Chapter 13, Sharing your stories with your stakeholders through R markdown)
  • dplyr for data manipulation (more on this in Chapter 2, A First Primer on Data Mining - Analysing Your Banking Account Data)
  • shiny package for web application development with R (more on this in Chapter 13, Sharing your stories with your stakeholders through R markdown)

The Jupyter Notebook (all OSs)

The Jupyter Notebook was primarily born as a Python extension to enable interactive data analysis and a fully reproducible workflow. The idea behind the Jupyter Notebook is to have both the code and the output of the code (plots and tables) within the same document. This allows both the developer and other subsequent readers, for instance a customer, to follow the logical flow of the analysis and gradually arrive at the results.

Compared to RStudio, Jupyter does not have a filesystem browser, nor an environment browser. Nevertheless, it is a very good alternative, especially when working on analyses which need to be shared.

Since it comes originally as a Python extension, it is actually developed with the Python language. This means that you will need to install Python as well as R to execute this application. Instructions on how to install Jupyter can be found in the Jupyter documentation at http://jupyter.readthedocs.io/en/latest/install.html.

After installing Jupyter, you will need to add a specific component, namely a kernel, to execute R code on the notebook. Instructions on how to install the kernel can be found on the component's home page at https://irkernel.github.io.

Visual Studio (Windows users only)

Visual Studio is a popular development tool, primarily for Visual Basic and C++ language development. Due to the recent interest showed by Microsoft in the R language, this IDE has been expanded through the introduction of the R Tools extension.

This extension adds all of the commonly expected features of an R IDE to the well-established platform such as Visual Studio. The main limitation at the moment is the availability of the product, as it is only available on a computer running on the Windows OS.

Also, Visual Studio is available for free, at least the Visual Studio Community Edition. Further details and installation guides are available at https://www.visualstudio.com/vs/rtvs.