Book Image

Data Wrangling with R

By : Gustavo R Santos
Book Image

Data Wrangling with R

By: Gustavo R Santos

Overview of this book

In this information era, where large volumes of data are being generated every day, companies want to get a better grip on it to perform more efficiently than before. This is where skillful data analysts and data scientists come into play, wrangling and exploring data to generate valuable business insights. In order to do that, you’ll need plenty of tools that enable you to extract the most useful knowledge from data. Data Wrangling with R will help you to gain a deep understanding of ways to wrangle and prepare datasets for exploration, analysis, and modeling. This data book enables you to get your data ready for more optimized analyses, develop your first data model, and perform effective data visualization. The book begins by teaching you how to load and explore datasets. Then, you’ll get to grips with the modern concepts and tools of data wrangling. As data wrangling and visualization are intrinsically connected, you’ll go over best practices to plot data and extract insights from it. The chapters are designed in a way to help you learn all about modeling, as you will go through the construction of a data science project from end to end, and become familiar with the built-in RStudio, including an application built with Shiny dashboards. By the end of this book, you’ll have learned how to create your first data model and build an application with Shiny in R.
Table of Contents (21 chapters)
1
Part 1: Load and Explore Data
5
Part 2: Data Wrangling
12
Part 3: Data Visualization
16
Part 4: Modeling

What this book covers

Chapter 1, Fundamentals of Data Wrangling, will introduce this book’s main theme, explaining what data wrangling is and why and when to use it. In addition, it also shows the main steps of a data science project and covers three well-known frameworks for data science projects.

Chapter 2, Load and Explore Datasets, provides different ways to load datasets to RStudio. Every project begins with data, so it is important to know how to load it into your session. It also begins exploring that data to familiarize you with exploratory data analysis.

Chapter 3, Basic Data Visualization, is the first touch point with data visualization, which is an important component of any data science project. In this chapter, we will learn about the first steps to creating compelling and meaningful graphics using only the built-in library from R.

Chapter 4, Working with Strings, starts our journey of learning about the wrangling functions for each major variable type. In this chapter, we study many possible transformations with text, from detecting words in a phrase or in a dataset to some highly customized functions that involve regular expressions and text mining concepts.

Chapter 5, Working with Numbers, comprises the transformations for numerical variables. The chapter covers operations with vectors, matrices, and data frames and also covers the apply functions and how to make a good read of the descriptive statistics of a dataset.

Chapter 6, Working with Date and Time Objects, is where we will learn more about this fascinating object type, date and time. It introduces concepts from the basics of creating a date and time object to a practical project that shows how it can be used in an analysis.

Chapter 7, Transformations with Base R, is the core of the book, exploring the most important transformations to be performed in a dataset. This chapter covers tasks such as slicing, grouping, replacing, arranging, binding data, and more. The most used transformations are covered here and mostly use the built-in functions without the need to load extra libraries.

Chapter 8, Transformations with tidyverse Libraries, follows the same idea as Chapter 7, but this time, the transformations are performed with tidyverse, which is a highly used R package for data science.

Chapter 9, Exploratory Data Analysis, is all about practice. After going over many transformation functions for different types of variables, it’s time to put the acquired knowledge into practice and work on a complete exploratory data analysis project.

Chapter 10, Introduction to ggplot2, introduces the visualization library, ggplot2, which is the most used library for data visualization in the R language, given its flexibility and robustness. In this chapter, we will learn more about the grammar of graphics and how ggplot2 is created based on this concept. We will also cover many kinds of plots and how to create them.

Chapter 11, Enhanced Visualizations with ggplot2, covers more advanced types of graphics that can be created with ggplot2, such as facet grids, maps, and 3D graphics.

Chapter 12, Other Data Visualization Options, is where we will see yet more options to visualize data, such as creating a basic plot in Microsoft Power BI but using the R language. We will also cover how to create word clouds and when that kind of visualization can be useful.

Chapter 13, Build a Model with R, is all about an end-to-end data science project. We will get a dataset and start exploring it, then we will clean the data and create some visualizations that help us to explain the steps taken, and that will lead us to the best model to be created.

Chapter 14, Build an Application with Shiny in R, is the final chapter, where we will take the model created in Chapter 13 and put it in production using a web application created with Shiny for R.