Book Image

R Programming By Example

By : Omar Trejo Navarro
Book Image

R Programming By Example

By: Omar Trejo Navarro

Overview of this book

R is a high-level statistical language and is widely used among statisticians and data miners to develop analytical applications. Often, data analysis people with great analytical skills lack solid programming knowledge and are unfamiliar with the correct ways to use R. Based on the version 3.4, this book will help you develop strong fundamentals when working with R by taking you through a series of full representative examples, giving you a holistic view of R. We begin with the basic installation and configuration of the R environment. As you progress through the exercises, you'll become thoroughly acquainted with R's features and its packages. With this book, you will learn about the basic concepts of R programming, work efficiently with graphs, create publication-ready and interactive 3D graphs, and gain a better understanding of the data at hand. The detailed step-by-step instructions will enable you to get a clean set of data, produce good visualizations, and create reports for the results. It also teaches you various methods to perform code profiling and performance enhancement with good programming practices, delegation, and parallelization. By the end of this book, you will know how to efficiently work with data, create quality visualizations and reports, and develop code that is modular, expressive, and maintainable.
Table of Contents (12 chapters)

What this book covers

Chapter 1, Introduction to R, covers the R basics you need to understand the rest of the examples. It is not meant to be a thorough introduction to R. Rather, it's meant to give you the very basic concepts and techniques you need to quickly get started with the three examples contained in the book, and which I introduce next.

This book uses three examples to showcase R's wide range of functionality. The first example shows how to analyze votes with descriptive statistics and linear models, and it is presented in Chapter 2, Understanding Votes with Descriptive Statistics and Chapter 3, Predicting Votes with Linear Models.

Chapter 2, Understanding Votes with Descriptive Statistics, shows how to programatically create hundreds of graphs to identify relations within data visually. It shows how to create histograms, scatter plots, correlation matrices, and how to perform Principal Component Analysis (PCA).

Chapter 3, Predicting Votes with Linear Models, shows how to programatically find the best predictive linear model for a set of data, and according to different success metrics. It also shows how to check model assumptions, and how to use cross validation to increase confidence in your results.

The second example shows how to simulate data, visualize it, analyze its text components, and create automatic presentations with it.

Chapter 4, Simulating Sales Data and Working with Databases, shows how to design data schema and simulate the various types of data. It also shows how to integrate real text data with simulated data, and how to use a SQL database to access it more efficiently.

Chapter 5, Communicating Sales with Visualization, shows how to produce basic to advanced graphs, highly customized graphs. It also shows how to create dynamic 3D graphs and interactive maps.

Chapter 6, Understanding Reviews with Text Analysis, shows how to perform text analysis step by step using Natural Language Processing (NLP) techniques, as well as sentiment analysis.

Chapter 7, Developing Automatic Presentations, shows how to put together the results of previous chapters to create presentations that can be automatically updated with the latest data using tools such as knitr and R Markdown.

Finally, the third example shows how to design and develop complex object-oriented systems that retrieve real-time data from cryptocurrency markets, as well as how to optimize implementations and how to build web applications around such systems.

Chapter 8, Object-Oriented System to Track Cryptocurrencies, introduces basic object-oriented techniques that produce complex systems when combined. Furthermore, it shows how to work with three of R’s most used object models, which are S3, S4, and R6, as well as how to make them work together.

Chapter 9, Implementing an Efficient Simple Moving Average, shows how to iteratively improve an implementation for a Simple Moving Average (SMA), starting with what is considered to be bad code, all the way to advanced optimization techniques using parallelization, and delegation to the Fortran and C++ languages.

Chapter 10, Adding Interactivity with Dashboards, shows how to wrap what was built during the previous two chapters to produce a modern web application using reactive programming through the Shiny package.

Appendix, Required Packages, shows how to install the internal and external software necessary to replicate the examples in the book. Specifically, it will walk through the installation processes for Linux and macOS, but Windows follows similar principles and should not cause any problems.