Book Image

R Statistics Cookbook

By : Francisco Juretig
2 (2)
Book Image

R Statistics Cookbook

2 (2)
By: Francisco Juretig

Overview of this book

R is a popular programming language for developing statistical software. This book will be a useful guide to solving common and not-so-common challenges in statistics. With this book, you'll be equipped to confidently perform essential statistical procedures across your organization with the help of cutting-edge statistical tools. You'll start by implementing data modeling, data analysis, and machine learning to solve real-world problems. You'll then understand how to work with nonparametric methods, mixed effects models, and hidden Markov models. This book contains recipes that will guide you in performing univariate and multivariate hypothesis tests, several regression techniques, and using robust techniques to minimize the impact of outliers in data.You'll also learn how to use the caret package for performing machine learning in R. Furthermore, this book will help you understand how to interpret charts and plots to get insights for better decision making. By the end of this book, you will be able to apply your skills to statistical computations using R 3.5. You will also become well-versed with a wide array of statistical techniques in R that are extensively used in the data science industry.
Table of Contents (12 chapters)

Creating barplots using ggplot

The ggplot2 package has become the dominant R package for creating serious plots, mainly due to its beautiful aesthetics. The ggplot package allows the user to define the plots in a sequential (or additive) way, and this great syntax has contributed to its enormous success. As you would expect, this package can handle a wide variety of plots.

Getting ready

In order to run this example, you will need the ggplot2 and the reshape packages. Both can be installed using the install.packages() command.

How to do it...

In this example, we will use a dataset in a wide format (multiple columns for each record), and we will do the appropriate data manipulation in order to transform it into a long format. Finally, we will use the ggplot2 package to make a stacked plot with that transformed data. In particular, we have data for certain companies. The adjusted sales are sales where the taxes have been removed and the unadjusted sales are the raw sales. Naturally, the unadjusted sales will always be greater than the adjusted ones, as shown in the following table:

Company Adjusted sales Unadjusted sales
Company1 298 394
Company2 392 454
Company3 453 499
Company4 541 598
Company5 674 762
  1. Import the ggplot2 and reshape libraries as follows:
library(ggplot2)
library(reshape)
  1. Then load the dataset:
datag = read.csv("./ctgs.csv")
  1. Transform the data into a long format:
transformed_data = melt(datag,id.vars = "Company")
  1. Use the ggplot function to create the plot:
ggplot(transformed_data, aes(x = Company, y = value, fill = variable)) + geom_bar(stat = "identity")

This results in the following output:

How it works...

In order to build a stacked plot, we need to supply three arguments to the aes() function. The x variable is the x-axis, y is the bar height, and fill is the color. The geom_var variable specifies the type of bar that will be used. The stat=identity value tells ggplot that we don't want to apply any transformation, and leave the data as it is. We will use the reshape package for transforming the data into the format that we need.

The result has one bar for each company, with two colors. The red color corresponds to the Adjusted Sales and the green color corresponds to the Unadjusted Sales.

There's more...

We can change the position of the bars, and place them one next to the other, instead of stacking them up. This can be achieved by using the position=position_dodge() option as shown in the following code block:

ggplot(transformed_data, aes(x = Company, y = value, fill = variable)) + geom_bar(stat = "identity",position=position_dodge())

This results in the following output:

See also