Book Image

Hands-On Data Science with R

By : Vitor Bianchi Lanzetta, Doug Ortiz, Nataraj Dasgupta, Ricardo Anjoleto Farias
Book Image

Hands-On Data Science with R

By: Vitor Bianchi Lanzetta, Doug Ortiz, Nataraj Dasgupta, Ricardo Anjoleto Farias

Overview of this book

R is the most widely used programming language, and when used in association with data science, this powerful combination will solve the complexities involved with unstructured datasets in the real world. This book covers the entire data science ecosystem for aspiring data scientists, right from zero to a level where you are confident enough to get hands-on with real-world data science problems. The book starts with an introduction to data science and introduces readers to popular R libraries for executing data science routine tasks. This book covers all the important processes in data science such as data gathering, cleaning data, and then uncovering patterns from it. You will explore algorithms such as machine learning algorithms, predictive analytical models, and finally deep learning algorithms. You will learn to run the most powerful visualization packages available in R so as to ensure that you can easily derive insights from your data. Towards the end, you will also learn how to integrate R with Spark and Hadoop and perform large-scale data analytics without much complexity.
Table of Contents (16 chapters)

Spark DataFrames within the RStudio IDE

Another simple way to start your Spark connections and browse your datasets is with the RStudio IDE. After you've installed the sparklyr package, it'll appear in the top-right part of your RStudio window, close to your R environment. If you aren't connected to Spark, it'll look like the following screenshot. If you are connected, call spark_disconnect_all() before continuing, so we'll be on the same page:

Figure 12.1: Spark shown in RStudio IDE

Click on the left arrow to see all connections, then click on the new connection button to establish a connection. A window will pop up where you can connect and manage Spark only with clicks:

Figure 12.2: Spark connection guide from RStudio IDE

Once you upload a DataFrame into your Spark connection, you can browse it by selecting the file shown here, just as you are used...