Book Image

R Programming By Example

By : Omar Trejo Navarro

Book Image

R Programming By Example

By: Omar Trejo Navarro

Overview of this book

R is a high-level statistical language and is widely used among statisticians and data miners to develop analytical applications. Often, data analysis people with great analytical skills lack solid programming knowledge and are unfamiliar with the correct ways to use R. Based on the version 3.4, this book will help you develop strong fundamentals when working with R by taking you through a series of full representative examples, giving you a holistic view of R. We begin with the basic installation and configuration of the R environment. As you progress through the exercises, you'll become thoroughly acquainted with R's features and its packages. With this book, you will learn about the basic concepts of R programming, work efficiently with graphs, create publication-ready and interactive 3D graphs, and gain a better understanding of the data at hand. The detailed step-by-step instructions will enable you to get a clean set of data, produce good visualizations, and create reports for the results. It also teaches you various methods to perform code profiling and performance enhancement with good programming practices, delegation, and parallelization. By the end of this book, you will know how to efficiently work with data, create quality visualizations and reports, and develop code that is modular, expressive, and maintainable.

Preface

What this book covers

What you need for this book

Who this book is for

Reader feedback

Customer support

Free Chapter

Introduction to R

Introduction to R

What R is and what it isn't

Comparing R with other software

The interpreter and the console

Tools to work efficiently with R

How to use this book

Tracking state with symbols and variables

Working with data types and data structures

Divide and conquer with functions

Complex logic with control structures

The examples in this book

Understanding Votes with Descriptive Statistics

Understanding Votes with Descriptive Statistics

This chapter's required packages

The Brexit votes example

Cleaning and setting up the data

Summarizing the data into a data frame

Getting intuition with graphs and correlations

Creating a new dataset with what we've learned

Building new variables with principal components

Putting it all together into high-quality code

Predicting Votes with Linear Models

Predicting Votes with Linear Models

Required packages

Setting up the data

Predicting votes with linear models

Checking model assumptions

Measuring accuracy with score functions

Programatically finding the best model

Predicting votes from wards with unknown data

Simulating Sales Data and Working with Databases

Simulating Sales Data and Working with Databases

Required packages

Designing our data tables

Simulating the sales data

Simulating the client data

Simulating the client messages data

Working with relational databases

Communicating Sales with Visualizations

Communicating Sales with Visualizations

Required packages

Extending our data with profit metrics

Building blocks for reusable high-quality graphs

Starting with simple applications for bar graphs

Graphing disaggregated data with boxplots

Scatter plots with joint and marginal distributions

Developing our own graph type – radar graphs

Exploring with interactive 3D scatter plots

Looking at dynamic data with time-series

Looking at geographical data with static maps

Navigating geographical data with interactive maps

Understanding Reviews with Text Analysis

Understanding Reviews with Text Analysis

This chapter's required packages

What is text analysis and how does it work?

Preparing, training, and testing data

Building the corpus with tokenization and data cleaning

Training models with cross validation

Improving our results with TF-IDF

Adding flexibility with N-grams

Reducing dimensionality with SVD

Extending our analysis with cosine similarity

Digging deeper with sentiment analysis

Testing our predictive model with unseen data

Retrieving text data from Twitter

Developing Automatic Presentations

Developing Automatic Presentations

Required packages

Why invest in automation?

Literate programming as a content creation methodology

The basic tools for an automation pipeline

A gentle introduction to Markdown

Extending Markdown with R Markdown

Developing graphs and analysis as we normally would

Building our presentation with R Markdown

Object-Oriented System to Track Cryptocurrencies

Object-Oriented System to Track Cryptocurrencies

This chapter's required packages

The cryptocurrencies example

A brief introduction to object-oriented programming

Introducing three object models in R – S3, S4, and R6

The architecture behind our cryptocurrencies system

Starting simple with timestamps using S3 classes

Implementing cryptocurrency assets using S4 classes

Implementing our storage layer with R6 classes

Retrieving live data for markets and wallets with R6 classes

Finally introducing users with S3 classes

Helping ourselves with a centralized settings file

Saving our initial user data into the system

Activating our system with two simple functions

Some advice when working with object-oriented systems

Implementing an Efficient Simple Moving Average

Implementing an Efficient Simple Moving Average

Required packages

Starting by using good algorithms

How fast is fast enough?

Calculating simple moving averages inefficiently

Understanding why R can be slow

Measuring by profiling and benchmarking

Easily achieving high benefit - cost improvements

Using parallelization to divide and conquer

Using C++ and Fortran to accelerate calculations

Looking back at what we have achieved

Other topics of interest to enhance performance

Adding Interactivity with Dashboards

Adding Interactivity with Dashboards

Required packages

What is functional reactive programming and why is it useful?

Designing our high-level application structure

Inserting a dynamic data table

Introducing interactivity with user input

Adding a summary table with shared data

Adding a simple moving average graph

Adding interactivity with a secondary zoom-in graph

Styling our application with themes

Other topics of interest

Required Packages

Required Packages

External requirements – software outside of R

Internal requirements – R packages

Loading R packages

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Extending our analysis with cosine similarity

Now we proceed to another technique familiar in linear algebra which operates on a vector space. The technique is known as cosine similarity (CS), and its purpose is to find vectors that are similar (or different) from each other. The idea is to measure the direction similarity (not magnitude) among client messages, and try to use it to predict similar outcomes when it comes to multiple purchases. The cosine similarity will be between 0 and 1 when the vectors are orthogonal and perpendicular, respectively. However, this similarity should not be interpreted as percentage because the movement rate for the cosine function is not linear. This means that a movement from 0.2 to 0.3 does not represent a similar movement magnitude from 0.8 to 0.9.

Given two vectors (rows in our DFM), the cosine similarity among them is computed by taking the...