R Programming By Example

By : Omar Trejo Navarro

R Programming By Example

By: Omar Trejo Navarro

Overview of this book

R is a high-level statistical language and is widely used among statisticians and data miners to develop analytical applications. Often, data analysis people with great analytical skills lack solid programming knowledge and are unfamiliar with the correct ways to use R. Based on the version 3.4, this book will help you develop strong fundamentals when working with R by taking you through a series of full representative examples, giving you a holistic view of R. We begin with the basic installation and configuration of the R environment. As you progress through the exercises, you'll become thoroughly acquainted with R's features and its packages. With this book, you will learn about the basic concepts of R programming, work efficiently with graphs, create publication-ready and interactive 3D graphs, and gain a better understanding of the data at hand. The detailed step-by-step instructions will enable you to get a clean set of data, produce good visualizations, and create reports for the results. It also teaches you various methods to perform code profiling and performance enhancement with good programming practices, delegation, and parallelization. By the end of this book, you will know how to efficiently work with data, create quality visualizations and reports, and develop code that is modular, expressive, and maintainable.

Preface

What this book covers

What you need for this book

Free Chapter

Introduction to R

What R is and what it isn't

Comparing R with other software

The interpreter and the console

Tools to work efficiently with R

How to use this book

Tracking state with symbols and variables

Working with data types and data structures

Divide and conquer with functions

Complex logic with control structures

The examples in this book

Summary

Understanding Votes with Descriptive Statistics

This chapter's required packages

The Brexit votes example

Cleaning and setting up the data

Summarizing the data into a data frame

Getting intuition with graphs and correlations

Creating a new dataset with what we've learned

Building new variables with principal components

Putting it all together into high-quality code

Summary

Predicting Votes with Linear Models

Required packages

Setting up the data

Predicting votes with linear models

Checking model assumptions

Measuring accuracy with score functions

Programatically finding the best model

Predicting votes from wards with unknown data

Summary

Simulating Sales Data and Working with Databases

Required packages

Designing our data tables

Simulating the sales data

Simulating the client data

Simulating the client messages data

Working with relational databases

Summary

Communicating Sales with Visualizations

Required packages

Extending our data with profit metrics

Building blocks for reusable high-quality graphs

Starting with simple applications for bar graphs

Graphing disaggregated data with boxplots

Scatter plots with joint and marginal distributions

Developing our own graph type – radar graphs

Exploring with interactive 3D scatter plots

Looking at dynamic data with time-series

Looking at geographical data with static maps

Navigating geographical data with interactive maps

Summary

Understanding Reviews with Text Analysis

This chapter's required packages

What is text analysis and how does it work?

Preparing, training, and testing data

Building the corpus with tokenization and data cleaning

Training models with cross validation

Improving our results with TF-IDF

Adding flexibility with N-grams

Reducing dimensionality with SVD

Extending our analysis with cosine similarity

Digging deeper with sentiment analysis

Testing our predictive model with unseen data

Retrieving text data from Twitter

Summary

Developing Automatic Presentations

Required packages

Why invest in automation?

Literate programming as a content creation methodology

The basic tools for an automation pipeline

A gentle introduction to Markdown

Header Level 1

Extending Markdown with R Markdown

Developing graphs and analysis as we normally would

Building our presentation with R Markdown

Summary

Object-Oriented System to Track Cryptocurrencies

This chapter's required packages

The cryptocurrencies example

A brief introduction to object-oriented programming

Introducing three object models in R – S3, S4, and R6

The architecture behind our cryptocurrencies system

Starting simple with timestamps using S3 classes

Implementing cryptocurrency assets using S4 classes

Implementing our storage layer with R6 classes

Retrieving live data for markets and wallets with R6 classes

Finally introducing users with S3 classes

Helping ourselves with a centralized settings file

Saving our initial user data into the system

Activating our system with two simple functions

Some advice when working with object-oriented systems

Summary

Implementing an Efficient Simple Moving Average

Required packages

Starting by using good algorithms

How fast is fast enough?

Calculating simple moving averages inefficiently

Understanding why R can be slow

Measuring by profiling and benchmarking

Easily achieving high benefit - cost improvements

Using parallelization to divide and conquer

Using C++ and Fortran to accelerate calculations

Looking back at what we have achieved

Other topics of interest to enhance performance

Summary

Adding Interactivity with Dashboards

Required packages

What is functional reactive programming and why is it useful?

Designing our high-level application structure

Inserting a dynamic data table

Introducing interactivity with user input

Adding a summary table with shared data

Adding a simple moving average graph

Adding interactivity with a secondary zoom-in graph

Styling our application with themes

The interpreter and the console

As I mentioned earlier, R is an interpreted language. When you enter an expression into the R console or execute an R script in your operating system's terminal, a program called the interpreter parses and executes the code. Other examples of interpreted languages are Lisp, Python, and JavaScript. Unlike C, C++, and Java, R doesn't require you to explicitly compile your programs before you execute them.

All R programs are composed of a series of expressions. The interpreter begins by parsing each expression, substituting objects for symbols where appropriate, evaluates them, and finally return the resulting objects. We will define each of these concepts in the following sections, but you should understand that this is the basic process through which all R programs go through.

The R console is the most important tool for using R and can be thought of as a wrapper around the interpreter. The console is a tool that allows you to type expressions directly into R and see how it responds. The interpreter will read the expressions and respond with a result or an error message, if there was one. When you execute expressions through the console, the interpreter will pass objects to the print() function automatically, which is why you can see the result printed below your expressions (we'll cover more on functions later).

If you've used a command line before (for example, bash in Linux of macOS or cmd.exe in Windows) or a language with an interactive interpreter such as Lisp, Python, or JavaScript, the console should look familiar since it simply is a command-line interface. If not, don't worry. Command-line interfaces are simple to use tools. They are programs that receive code and return objects whose printed representations you see below the code you execute.

When you launch R, you will see a window with the R console. Inside the console you will see a message like the one shown below. This message displays some basic information, including the version of R you're running, license information, reminders about how to get help, and a Command Prompt.

Note that the R version in this case is 3.4.2. The code developed during this book will assume this version. If you have a different version, but in case you end up with some problems, this could be a reason you may want to look into.

You should note that, by default, R will display a greater-than sign (>) at the beginning of the last line of the console, signaling you that it's ready to receive commands. Since R is prompting you to type something, this is called a Command Prompt. When you see the greater-than symbol, R is able to receive more expressions as input. When you don't, it is probably because R is busy processing something you sent, and you should wait for it to finish before sending something else.

For the most part, in this book we will avoid showing such command prompts at all, since you may be typing the code into a source code file or directly into the console, but if we do introduce it, make sure that you don't explicitly type it. For example, if you want to replicate the following snippet, you should only type 1 + 2 in your console, and press the Enter key. When you do, you will see a [1] 3 which is the output you received back from R. Go ahead and execute various arithmetic expressions to get a feel for the console:

> 1 + 2
[1] 3

Note the [1] that accompanies each returned value. It's there because the result is actually a vector (an ordered collection). The [1] means that the index of the first item displayed in that row is 1 (in this case, our resulting vector has a single value within).

Finally, you should know that the console provides tools for looking through previous commands. You will probably find that the up and down arrow keys are the most useful. You can scroll through previous commands by pressing them. The up arrow lets you look at earlier commands, and the down arrow lets you look at later commands. If you would like to repeat a previous command with a minor change, or if you need to correct a mistake, you can easily do so using these keys.

R Programming By Example

By : Omar Trejo Navarro

R Programming By Example

By: Omar Trejo Navarro

Overview of this book

Related Content you might be interested in

Current Title:

R Programming By Example

Web Application Development with R Using Shiny

Mastering Machine Learning with R

R Data Analysis Cookbook