Book Image

Hands-On Data Science with R

By : Vitor Bianchi Lanzetta, Doug Ortiz, Nataraj Dasgupta, Ricardo Anjoleto Farias
Book Image

Hands-On Data Science with R

By: Vitor Bianchi Lanzetta, Doug Ortiz, Nataraj Dasgupta, Ricardo Anjoleto Farias

Overview of this book

R is the most widely used programming language, and when used in association with data science, this powerful combination will solve the complexities involved with unstructured datasets in the real world. This book covers the entire data science ecosystem for aspiring data scientists, right from zero to a level where you are confident enough to get hands-on with real-world data science problems. The book starts with an introduction to data science and introduces readers to popular R libraries for executing data science routine tasks. This book covers all the important processes in data science such as data gathering, cleaning data, and then uncovering patterns from it. You will explore algorithms such as machine learning algorithms, predictive analytical models, and finally deep learning algorithms. You will learn to run the most powerful visualization packages available in R so as to ensure that you can easily derive insights from your data. Towards the end, you will also learn how to integrate R with Spark and Hadoop and perform large-scale data analytics without much complexity.
Table of Contents (16 chapters)

Getting Started with Data Science and R

“It is a capital mistake to theorise before one has data.”
― Sir Arthur Conan Doyle, The Adventures of Sherlock Holmes

Data, like science, has been ubiquitous the world over since early history. The term data science is not generally taken to literally mean science with data, since without data there would be of science. Rather, it is a specialized field in which data scientists and other practitioners apply advanced computing techniques, usually along with algorithms or predictive analytics to uncover insights that may be challenging to obtain with traditional methods.

Data science as a distinct subject was proposed since the early 1960s by pioneers and thought leaders such as Peter Naur, Prof. Jeff Wu, and William Cleveland. Today, we have largely realized the vision that Prof. Wu and others had in mind when the concept first arose; data science as an amalgamation of computing, data mining, and predictive analytics, all leading up to deriving key insights that drive business and growth across the world today.

The driving force behind this has been the rapid but proportional growth of computing capabilities and algorithms. Computing languages have also played a key role in supporting the emergence of data science, primary among them being the statistical language R.

In this introductory chapter, we will cover the following topics:

  • Introduction to data science and R
  • Active domains of data science
  • Solving problems with data science
  • Using R for data science
  • Setting up R and RStudio
  • Our first R program