Book Image

Hands-On Data Science with R

By : Vitor Bianchi Lanzetta, Doug Ortiz, Nataraj Dasgupta, Ricardo Anjoleto Farias
Book Image

Hands-On Data Science with R

By: Vitor Bianchi Lanzetta, Doug Ortiz, Nataraj Dasgupta, Ricardo Anjoleto Farias

Overview of this book

R is the most widely used programming language, and when used in association with data science, this powerful combination will solve the complexities involved with unstructured datasets in the real world. This book covers the entire data science ecosystem for aspiring data scientists, right from zero to a level where you are confident enough to get hands-on with real-world data science problems. The book starts with an introduction to data science and introduces readers to popular R libraries for executing data science routine tasks. This book covers all the important processes in data science such as data gathering, cleaning data, and then uncovering patterns from it. You will explore algorithms such as machine learning algorithms, predictive analytical models, and finally deep learning algorithms. You will learn to run the most powerful visualization packages available in R so as to ensure that you can easily derive insights from your data. Towards the end, you will also learn how to integrate R with Spark and Hadoop and perform large-scale data analytics without much complexity.
Table of Contents (16 chapters)

Providing interfaces to Spark packages

API is an acronym for application programming interfaces and you can imagine it as a user interface application, but for software instead of humans. Saying it in another way, it could be seen as a tool for your programming. When you need it to insert a nail, you call a hammer API, and if you need it to remove a nail, then you call a plier API. Spark has its own toolbox API for R, which you can access here: https://spark.apache.org/docs/2.2.0/api/R/index.html.

Following our explanation, extensions are customized R packages created to provide an interface to any Spark package, or the Spark toolbox. There are many extensions already available, but you also can create your own extension to call any of these Spark APIs. One extension example is the rsparkling package viewed in the previous section; it is the rsparkling package that provides an...