Book Image

R Programming By Example

By : Omar Trejo Navarro
Book Image

R Programming By Example

By: Omar Trejo Navarro

Overview of this book

R is a high-level statistical language and is widely used among statisticians and data miners to develop analytical applications. Often, data analysis people with great analytical skills lack solid programming knowledge and are unfamiliar with the correct ways to use R. Based on the version 3.4, this book will help you develop strong fundamentals when working with R by taking you through a series of full representative examples, giving you a holistic view of R. We begin with the basic installation and configuration of the R environment. As you progress through the exercises, you'll become thoroughly acquainted with R's features and its packages. With this book, you will learn about the basic concepts of R programming, work efficiently with graphs, create publication-ready and interactive 3D graphs, and gain a better understanding of the data at hand. The detailed step-by-step instructions will enable you to get a clean set of data, produce good visualizations, and create reports for the results. It also teaches you various methods to perform code profiling and performance enhancement with good programming practices, delegation, and parallelization. By the end of this book, you will know how to efficiently work with data, create quality visualizations and reports, and develop code that is modular, expressive, and maintainable.
Table of Contents (12 chapters)

Comparing R with other software

My intention for this section is not to provide a comprehensive comparison between R and other software, but to simply point out a few of R's most noticeable features. If you can, I encourage you to test other software yourself so that you know first-hand what may be the best tool for the job at hand.

The most noticeable feature of R compared to other statistical software such as SAS, Stata, SPSS, and even Python, is the very large number of packages that it has available. At the time of writing this, there are almost 12,000 packages published in The Comprehensive R Archive Network (CRAN) (https://cran.r-project.org/), and this does not include packages published in other places, such as Git repositories. This enables R to have a very large community and a huge number of tools for data analysis in areas such as finance, mathematics, machine learning, high-performance computing, and many others.

With the exception of Python, R has much more programming capabilities than SAS, Stata, SPSS, and even more so than Python in some respects (for example, in R, you may use different object models). However, efficient and effective R usage requires the use of code which implies a steep learning curve for some people, while Stata and SPSS have graphical user interfaces that guide the user through many of the tasks with point-and-click wizards. In my opinion, this hand-holding, although nice for beginners, quickly becomes an important restriction for people looking to become intermediate to advanced users, and that's where the advantage of programming really shines.

R has one of the best graphics systems among all existing software. The most popular package for producing graphs in R, which we will use extensively in this book, is the ggplot2 package, but there are many other great graphing packages as well. This package allows the modification of virtually every aspect of a graph through its graphics grammar, and is far superior to anything I've seen in SPSS, Stata, SAS, or even Python.

R is a great tool, but it's not the right tool for everything. If you're looking to perform data analysis but don't want to invest the time in learning to program, then software like SAS, Stata, or SPSS may be a better option for you. If you're looking to develop analytical software that is very easily integrated into larger systems and which needs to plug into various interfaces, then Python may be a better tool for the job. However, if you're looking to do a lot of complex data analysis and graphing, and you are going to mostly spend your time focused on these areas, then R is a great choice.