Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Mastering Parallel Programming with R
  • Table Of Contents Toc
Mastering Parallel Programming with R

Mastering Parallel Programming with R

By : Simon R. Chapple, Sloan, Forster, Troup
5 (4)
close
close
Mastering Parallel Programming with R

Mastering Parallel Programming with R

5 (4)
By: Simon R. Chapple, Sloan, Forster, Troup

Overview of this book

R is one of the most popular programming languages used in data science. Applying R to big data and complex analytic tasks requires the harnessing of scalable compute resources. Mastering Parallel Programming with R presents a comprehensive and practical treatise on how to build highly scalable and efficient algorithms in R. It will teach you a variety of parallelization techniques, from simple use of R’s built-in parallel package versions of lapply(), to high-level AWS cloud-based Hadoop and Apache Spark frameworks. It will also teach you low level scalable parallel programming using RMPI and pbdMPI for message passing, applicable to clusters and supercomputers, and how to exploit thousand-fold simple processor GPUs through ROpenCL. By the end of the book, you will understand the factors that influence parallel efficiency, including assessing code performance and implementing load balancing; pitfalls to avoid, including deadlock and numerical instability issues; how to structure your code and data for the most appropriate type of parallelism for your problem domain; and how to extract the maximum performance from your R code running on a variety of computer systems.
Table of Contents (8 chapters)
close
close

Preface

We are in the midst of an information explosion. Everything in our lives is becoming instrumented and connected in real-time with the Internet of Things, from our own biology to the world's environment. By some measures, it is projected that by 2020, world data will have grown by more than a factor of 10 from today to a staggering 44 Zettabytes—just one Zettabyte is the equivalent of 250 billion DVDs. In order to process this volume and velocity of big data, we need to harness a vast amount of compute, memory, and disk resources, and to do this, we need parallelism.

Despite its age, R—the open source statistical programming language, continues to grow in popularity as one of the key cornerstone technologies to analyze data, and is used by an ever-expanding community of, dare I say the currently in-vogue designation of, "data scientists".

There are of course many other tools that a data scientist may deploy in taming the beast of big data. You may also be a Python, SAS, SPSS, or MATLAB guru. However, R, with its long open source heritage since 1997, remains pervasive, and with the extraordinarily wide variety of additional CRAN-hosted plug-in library packages that were developed over the intervening 20 years, it is highly capable of almost all forms of data analysis, from small numeric matrices to very large symbolic datasets, such as bio-molecular DNA. Indeed, I am tempted to go as far as to suggest that R is becoming the de facto data science scripting language, which is capable of orchestrating highly complex analytics pipelines that involve many different types of data.

R, in itself, has always been a single-threaded implementation, and it is not designed to exploit parallelism within its own language primitives. Instead, it relies on specifically implemented external package libraries to achieve this for certain accelerated functions and to enable the use of parallel processing frameworks. We will focus on a select number of these that represent the best implementations that are available today to develop parallel algorithms across a range of technologies.

In this book, we will cover many different aspects of parallelism, from Single Program Multiple Data (SPMD) to Single Instruction Multiple Data (SIMD) vector processing, including utilizing R's built-in multicore capabilities with its parallel package, message passing using the Message Passing Interface (MPI) standard, and General Purpose GPU (GPGPU)-based parallelism with OpenCL. We will also explore different framework approaches to parallelism, from load balancing through task farming to spatial processing with grids. We will touch on more general purpose batch-data processing in the cloud with Hadoop and (as a bonus) the hot new tech in cluster computing, Apache Spark, which is much better suited to real-time data processing at scale.

We will even explore how to use a real bona fide multi-million pound supercomputer. Yes, I know that you may not own one of these, but in this book, we'll show you what its like to use one and how much performance parallelism can achieve. Who knows, with your new found knowledge, maybe you can rock up at your local Supercomputer Center and convince them to let you spin up some massively parallel computing!

All of the coding examples that are presented in this book are original work and have been chosen partly so as not to duplicate the kind of example you might otherwise encounter in other books of this nature. They are also chosen to hopefully engage you, dear reader, with something a little bit different to the run-of-the-mill. We, the authors, very much hope you enjoy the journey that you are about to undertake through Mastering Parallel Programming in R.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Mastering Parallel Programming with R
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon