Book Image

R Programming By Example

By : Omar Trejo Navarro

Book Image

R Programming By Example

By: Omar Trejo Navarro

Overview of this book

R is a high-level statistical language and is widely used among statisticians and data miners to develop analytical applications. Often, data analysis people with great analytical skills lack solid programming knowledge and are unfamiliar with the correct ways to use R. Based on the version 3.4, this book will help you develop strong fundamentals when working with R by taking you through a series of full representative examples, giving you a holistic view of R. We begin with the basic installation and configuration of the R environment. As you progress through the exercises, you'll become thoroughly acquainted with R's features and its packages. With this book, you will learn about the basic concepts of R programming, work efficiently with graphs, create publication-ready and interactive 3D graphs, and gain a better understanding of the data at hand. The detailed step-by-step instructions will enable you to get a clean set of data, produce good visualizations, and create reports for the results. It also teaches you various methods to perform code profiling and performance enhancement with good programming practices, delegation, and parallelization. By the end of this book, you will know how to efficiently work with data, create quality visualizations and reports, and develop code that is modular, expressive, and maintainable.

Preface

What this book covers

What you need for this book

Who this book is for

Reader feedback

Customer support

Free Chapter

Introduction to R

Introduction to R

What R is and what it isn't

Comparing R with other software

The interpreter and the console

Tools to work efficiently with R

How to use this book

Tracking state with symbols and variables

Working with data types and data structures

Divide and conquer with functions

Complex logic with control structures

The examples in this book

Understanding Votes with Descriptive Statistics

Understanding Votes with Descriptive Statistics

This chapter's required packages

The Brexit votes example

Cleaning and setting up the data

Summarizing the data into a data frame

Getting intuition with graphs and correlations

Creating a new dataset with what we've learned

Building new variables with principal components

Putting it all together into high-quality code

Predicting Votes with Linear Models

Predicting Votes with Linear Models

Required packages

Setting up the data

Predicting votes with linear models

Checking model assumptions

Measuring accuracy with score functions

Programatically finding the best model

Predicting votes from wards with unknown data

Simulating Sales Data and Working with Databases

Simulating Sales Data and Working with Databases

Required packages

Designing our data tables

Simulating the sales data

Simulating the client data

Simulating the client messages data

Working with relational databases

Communicating Sales with Visualizations

Communicating Sales with Visualizations

Required packages

Extending our data with profit metrics

Building blocks for reusable high-quality graphs

Starting with simple applications for bar graphs

Graphing disaggregated data with boxplots

Scatter plots with joint and marginal distributions

Developing our own graph type – radar graphs

Exploring with interactive 3D scatter plots

Looking at dynamic data with time-series

Looking at geographical data with static maps

Navigating geographical data with interactive maps

Understanding Reviews with Text Analysis

Understanding Reviews with Text Analysis

This chapter's required packages

What is text analysis and how does it work?

Preparing, training, and testing data

Building the corpus with tokenization and data cleaning

Training models with cross validation

Improving our results with TF-IDF

Adding flexibility with N-grams

Reducing dimensionality with SVD

Extending our analysis with cosine similarity

Digging deeper with sentiment analysis

Testing our predictive model with unseen data

Retrieving text data from Twitter

Developing Automatic Presentations

Developing Automatic Presentations

Required packages

Why invest in automation?

Literate programming as a content creation methodology

The basic tools for an automation pipeline

A gentle introduction to Markdown

Extending Markdown with R Markdown

Developing graphs and analysis as we normally would

Building our presentation with R Markdown

Object-Oriented System to Track Cryptocurrencies

Object-Oriented System to Track Cryptocurrencies

This chapter's required packages

The cryptocurrencies example

A brief introduction to object-oriented programming

Introducing three object models in R – S3, S4, and R6

The architecture behind our cryptocurrencies system

Starting simple with timestamps using S3 classes

Implementing cryptocurrency assets using S4 classes

Implementing our storage layer with R6 classes

Retrieving live data for markets and wallets with R6 classes

Finally introducing users with S3 classes

Helping ourselves with a centralized settings file

Saving our initial user data into the system

Activating our system with two simple functions

Some advice when working with object-oriented systems

Implementing an Efficient Simple Moving Average

Implementing an Efficient Simple Moving Average

Required packages

Starting by using good algorithms

How fast is fast enough?

Calculating simple moving averages inefficiently

Understanding why R can be slow

Measuring by profiling and benchmarking

Easily achieving high benefit - cost improvements

Using parallelization to divide and conquer

Using C++ and Fortran to accelerate calculations

Looking back at what we have achieved

Other topics of interest to enhance performance

Adding Interactivity with Dashboards

Adding Interactivity with Dashboards

Required packages

What is functional reactive programming and why is it useful?

Designing our high-level application structure

Inserting a dynamic data table

Introducing interactivity with user input

Adding a summary table with shared data

Adding a simple moving average graph

Adding interactivity with a secondary zoom-in graph

Styling our application with themes

Other topics of interest

Required Packages

Required Packages

External requirements – software outside of R

Internal requirements – R packages

Loading R packages

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Improving our results with TF-IDF

In general in text analysis, a high raw count for a term inside a text does not necessarily mean that the term is more important for the text. One of the most important ways to normalize the term frequencies is to weigh a term by how often it appears not only in a text, but also in the entire corpus.

The more a word appears inside a given text and doesn't appear too much across the whole corpus, it means that it's probably important for that specific text. However, if the term appears a lot inside a text, but also appears a lot in other texts in the corpus, it's probably not important for the specific text, but for the entire corpus, and this dilutes it's predictive power.

In IR, TF-IDF is one of the most popular term-weighting schemes and it's the mathematical implementation of the idea expressed in the preceding paragraph...