Book Image

R High Performance Programming

Book Image

R High Performance Programming

Overview of this book

Table of Contents (17 chapters)
R High Performance Programming
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Index

A

  • active data
    • and non-active data, swapping / Swapping active and nonactive data
  • algorithm design affects
    • time complexity / Algorithm design affects time and space complexity
    • space complexity / Algorithm design affects time and space complexity
  • Amazon Machine Images (AMIs)
    • about / Installing gputools
  • Amazon Web Services (AWS)
    • about / Installing gputools
    • Hadoop, setting on / Setting up Hadoop on Amazon Web Services
  • Amdahl's law
    • about / Data parallelism versus task parallelism
  • array databases
    • using / Using array databases for maximum scientific-computing performance

B

  • batch processing
    • about / Processing large datasets in batches using Hadoop
  • big.matrix object
    • about / The bigmemory package
  • bigmemory package
    • using / The bigmemory package
  • bit vectors
    • using / Bit vectors
  • built-in functions
    • using / Use of built-in functions

C

  • cfunction() function, arguments
    • sig=signature(x="numeric", n="integer") / Including compiled code inline
    • body / Including compiled code inline
    • language="C" / Including compiled code inline
  • columnar databases
    • using / Using columnar databases for improved performance
  • columnar storage
    • about / Using columnar databases for improved performance
  • compiled code
    • considerations / Considerations for using compiled code
    • R APIs / R APIs
    • R data types, versus native data types / R data types versus native data types
    • R objects, creating / Creating R objects and garbage collection
    • garbage collection / Creating R objects and garbage collection
    • memory, allocating for non-R objects / Allocating memory for non-R objects
  • compiled code inline
    • including / Including compiled code inline
  • compiled language / R is interpreted on the fly
  • compiled languages
    • using / Using compiled languages in R
    • prerequisites / Prerequisites
    • compiled code inline, including / Including compiled code inline
    • external compiled code, calling / Calling external compiled code
    • compiled code, considerations / Considerations for using compiled code
  • compiler package
    • about / Compiling functions
  • computing performance
    • CPU / Three constraints on computing performance – CPU, RAM, and disk I/O
    • RAM / Three constraints on computing performance – CPU, RAM, and disk I/O
    • disk I/O / Three constraints on computing performance – CPU, RAM, and disk I/O
    • bottlenecks / Three constraints on computing performance – CPU, RAM, and disk I/O
  • copy-on-modification model
    • about / Reusing objects without taking up more memory
  • CPU utilization
    • monitoring / Monitoring memory utilization, CPU utilization, and disk I/O using OS tools
  • CUDA-enabled GPU card
    • URL / Installing gputools
  • CUDA toolkit
    • URL, for downloading / Installing gputools

D

  • data
    • processing, in chunks / Using memory-mapped files and processing data in chunks
    • preprocessing in relational database, SQL used / Preprocessing data in a relational database using SQL
    • uploading, into HDFS / Uploading data to HDFS
  • database
    • statistical algorithm, executing / Running statistical and machine learning algorithms in a database
    • machine learning algorithm, executing / Running statistical and machine learning algorithms in a database
  • datadr package
    • about / Other Hadoop packages for R
  • data extraction in R
    • versus data processing in database / Extracting data into R versus processing data in a database
  • data parallel algorithms
    • implementing / Implementing data parallel algorithms
  • data parallelism
    • versus task parallelism / Data parallelism versus task parallelism
    • about / Data parallelism versus task parallelism
    • examples / Data parallelism versus task parallelism
  • disk I/O
    • monitoring / Monitoring memory utilization, CPU utilization, and disk I/O using OS tools
  • distributed memory parallelism
    • about / Shared memory versus distributed memory parallelism
    • versus shared memory parallelism / Shared memory versus distributed memory parallelism
  • dplyr package
    • about / Using dplyr
    • used, for converting R expressions / Using dplyr
  • dynamically typed language
    • about / Using compiled languages in R

E

  • elapsed time
    • about / Measuring execution time with system.time()
  • Elastic MapReduce (EMR)
    • about / Setting up Hadoop on Amazon Web Services
    • URL, for prices / Setting up Hadoop on Amazon Web Services
  • execution time
    • about / Measuring total execution time
    • measuring / Measuring total execution time
    • measuring, with system.time() / Measuring execution time with system.time()
    • user time / Measuring execution time with system.time()
    • system time / Measuring execution time with system.time()
    • elapsed time / Measuring execution time with system.time()
    • time measurements, repeating with rbenchmark / Repeating time measurements with rbenchmark
    • distribution, measuring with microbenchmark / Measuring distribution of execution time with microbenchmark
    • profiling / Profiling the execution time
    • function, profiling with Rprof() / Profiling a function with Rprof()
    • profiling results / The profiling results
  • external compiled code
    • calling / Calling external compiled code

F

  • fast alternative packages, in CRAN / Seeking fast alternative packages in CRAN
  • ffbase package
    • about / The ff package
    • functions / The ff package
  • ffdf tool
    • about / Uploading data to HDFS
  • ff package
    • using / The ff package
  • ff package, data types
    • Boolean / The ff package
    • Logical / The ff package
    • Quad / The ff package
    • Nibble / The ff package
    • Byte / The ff package
    • Ubyte / The ff package
    • Short / The ff package
    • Ushort / The ff package
    • Integer / The ff package
    • Single / The ff package
    • Double / The ff package
    • Complex / The ff package
    • Raw / The ff package
    • Factor / The ff package
    • Ordered / The ff package
    • POSIXct / The ff package
    • Date / The ff package
  • Field Programmable Gate Arrays (FPGAs)
    • about / General purpose computing on GPUs
  • forked clusters
    • about / Implementing data parallel algorithms

G

  • garbage collection
    • about / Creating R objects and garbage collection
  • garbage collector
    • about / Removing intermediate data when it is no longer needed
  • gmatrix
    • about / R and GPUs
  • Google Books Ngrams data
    • URL / Uploading data to HDFS
  • GPU
    • general purpose computing / General purpose computing on GPUs
    • with R / R and GPUs
    • gputools, installing / Installing gputools
    • performance affecting factors / Fast statistical modeling in R with gputools
  • gputools
    • about / R and GPUs
    • installing / Installing gputools
    • URL, for installation / Installing gputools
    • used, for statistical modeling / Fast statistical modeling in R with gputools

H

  • Hadoop
    • about / Understanding Hadoop
    • URL / Understanding Hadoop
    • setting, on Amazon Web Services (AWS) / Setting up Hadoop on Amazon Web Services
    • used, for processing large datasets / Processing large datasets in batches using Hadoop
    • data, uploading into HDFS / Uploading data to HDFS
    • HDFS data, analyzing with RHadoop / Analyzing HDFS data with RHadoop
    • R packages / Other Hadoop packages for R
  • hash tables
    • using, for frequent lookups / Use of hash tables for frequent lookups on large data
  • HDFS
    • about / Understanding Hadoop
    • data, uploading into / Uploading data to HDFS

I

  • inline package
    • using / Including compiled code inline
  • installation, gputools
    • about / Installing gputools
  • installation, MADlib
    • about / Running statistical and machine learning algorithms in a database
  • installation, Rtools
    • about / Prerequisites
  • installation, SciDB / Using array databases for maximum scientific-computing performance
  • installation, Xcode Command Line Tools / Prerequisites
  • interfaces, external compiled code
    • .C() / Calling external compiled code
    • .Fortran() / Calling external compiled code
    • .Call() / Calling external compiled code
    • .External() / Calling external compiled code
  • intermediate data
    • removing / Removing intermediate data when it is no longer needed

J

  • just-in-time (JIT) compilation
    • about / Just-in-time (JIT) compilation of R code

K

  • key measures, resource utilization / Monitoring memory utilization, CPU utilization, and disk I/O using OS tools

L

  • Linux AMIs
    • URL / Installing gputools

M

  • machine learning algorithm
    • executing / Running statistical and machine learning algorithms in a database
  • MADlib
    • URL / Running statistical and machine learning algorithms in a database
    • URL, for building / Running statistical and machine learning algorithms in a database
    • URL, for installation guide / Running statistical and machine learning algorithms in a database
    • installing / Running statistical and machine learning algorithms in a database
  • Map
    • about / Understanding Hadoop
  • mapper
    • about / Understanding Hadoop
  • MapReduce
    • about / Understanding Hadoop
  • massively parallel processing (MPP)
    • about / Using array databases for maximum scientific-computing performance
  • Matrix package
    • about / Sparse matrices
  • measures, performance problems troubleshooting / Monitoring memory utilization, CPU utilization, and disk I/O using OS tools
  • memory
    • preallocating / Preallocating memory
    • allocating, for non-R objects / Allocating memory for non-R objects
  • memory-efficient data structures
    • using / Using memory-efficient data structures
    • smaller data types, using / Smaller data types
    • sparse matrices, using / Sparse matrices, Symmetric matrices
    • symmetric matrices, using / Symmetric matrices
    • bit vectors, using / Bit vectors
  • memory-mapped files
    • using / Using memory-mapped files and processing data in chunks
    • about / Using memory-mapped files and processing data in chunks
    • bigmemory package, using / The bigmemory package
    • ff package, using / The ff package
  • memory utilization
    • profiling / Profiling memory utilization
    • monitoring / Monitoring memory utilization, CPU utilization, and disk I/O using OS tools
  • MonetDB
    • about / Using columnar databases for improved performance
    • URL, for downloading / Using columnar databases for improved performance

O

  • objects
    • reusing / Reusing objects without taking up more memory
  • OpenCL
    • about / R and GPUs

P

  • parallel computing
    • performance, optimizing / Optimizing parallel performance
  • pass by reference
    • about / Reusing objects without taking up more memory
  • pass by value
    • about / Reusing objects without taking up more memory
  • performance bottlenecks
    • identifying / Identifying and resolving bottlenecks
    • resolving / Identifying and resolving bottlenecks
  • PivotalR package
    • about / Using PivotalR
    • used, for converting R expressions / Using PivotalR
  • plyrmr package
    • about / Other Hadoop packages for R
  • PostgreSQL
    • setting up / Preprocessing data in a relational database using SQL
    • URL, for downloading / Preprocessing data in a relational database using SQL
  • Principal Component Analysis (PCA) / Seeking fast alternative packages in CRAN

R

  • R
    • running, single-threaded on CPU / R is single-threaded
    • data, loading into memory / R requires all data to be loaded into memory
    • GPU / R and GPUs
  • RAM optimization
    • objects, reusing / Reusing objects without taking up more memory
    • intermediate data, removing / Removing intermediate data when it is no longer needed
    • values, calculating on fly / Calculating values on the fly instead of storing them persistently
    • active and non-active data, swapping / Swapping active and nonactive data
  • R APIs
    • about / R APIs
    • features / R APIs
  • ravro package
    • about / Other Hadoop packages for R
  • R code
    • executing, on fly / R is interpreted on the fly
    • compiling, before execution / Compiling R code before execution
    • functions, compiling / Compiling functions
    • just-in-time (JIT) compilation / Just-in-time (JIT) compilation of R code
  • Rcpp package
    • about / Calling external compiled code
    • URL / Calling external compiled code
  • RCUDA
    • about / R and GPUs
  • R data types
    • versus native data types / R data types versus native data types
  • Reduce
    • about / Understanding Hadoop
  • reducers
    • about / Understanding Hadoop
  • relational database
    • data preprocessing, SQL used / Preprocessing data in a relational database using SQL
  • R expressions
    • converting, into SQL / Converting R expressions to SQL
    • converting, dplyr package used / Using dplyr
    • converting, PivotalR package used / Using PivotalR
  • RHadoop
    • about / Processing large datasets in batches using Hadoop
    • URL / Processing large datasets in batches using Hadoop
    • rhdfs package / Processing large datasets in batches using Hadoop
    • rmr2 package / Processing large datasets in batches using Hadoop
    • used, for analyzing HDFS data / Analyzing HDFS data with RHadoop
    • plyrmr package / Other Hadoop packages for R
    • rhbase package / Other Hadoop packages for R
    • ravro package / Other Hadoop packages for R
    • RHIPE package / Other Hadoop packages for R
    • datadr package / Other Hadoop packages for R
    • Trelliscope package / Other Hadoop packages for R
    • Segue package / Other Hadoop packages for R
  • rhbase package
    • about / Other Hadoop packages for R
  • rhdfs package
    • about / Processing large datasets in batches using Hadoop
    • URL, for installation / Processing large datasets in batches using Hadoop
  • RHIPE package
    • URL / Other Hadoop packages for R
    • about / Other Hadoop packages for R
  • rmr2 package
    • about / Processing large datasets in batches using Hadoop
    • URL, for installation / Processing large datasets in batches using Hadoop
  • R objects
    • creating / Creating R objects and garbage collection
  • R packages, for GPU
    • gputools / R and GPUs
    • gmatrix / R and GPUs
    • RCUDA / R and GPUs
    • OpenCL / R and GPUs
  • Rprof()
    • used, for profiling function / Profiling a function with Rprof()
    • about / The profiling results
  • Rtools
    • URL, for downloading / Prerequisites
    • installing / Prerequisites

S

  • SciDB
    • about / Using array databases for maximum scientific-computing performance
    • URL, for downloading / Using array databases for maximum scientific-computing performance
    • installing / Using array databases for maximum scientific-computing performance
  • Segue package
    • URL / Other Hadoop packages for R
    • about / Other Hadoop packages for R
  • SEXP pointers
    • about / Calling external compiled code
  • shared memory parallelism
    • versus distributed memory parallelism / Shared memory versus distributed memory parallelism
    • about / Shared memory versus distributed memory parallelism
  • shim
    • URL, for installing / Using array databases for maximum scientific-computing performance
  • simpler data structures
    • using / Use of simpler data structures
  • smaller data types
    • using / Smaller data types
  • socket-based cluster
    • about / Implementing data parallel algorithms
  • space complexity / Algorithm design affects time and space complexity
  • sparse matrices
    • using / Sparse matrices
  • SQL
    • used, for data preprocessing in relational database / Preprocessing data in a relational database using SQL
    • R expressions, converting / Converting R expressions to SQL
  • statically typed language
    • about / Using compiled languages in R
  • statistical algorithm
    • executing / Running statistical and machine learning algorithms in a database
  • statistical modeling
    • with gputools / Fast statistical modeling in R with gputools
  • symmetric matrices
    • using / Symmetric matrices
  • system-wide resource utilization measure
    • monitoring / Monitoring memory utilization, CPU utilization, and disk I/O using OS tools
  • system.time()
    • about / Measuring execution time with system.time()
  • system time
    • about / Measuring execution time with system.time()

T

  • task parallel algorithms
    • implementing / Implementing task parallel algorithms
    • same task, executing / Running the same task on workers in a cluster
    • multiple tasks, executing / Running different tasks on workers in a cluster
  • task parallelism
    • versus data parallelism / Data parallelism versus task parallelism
    • about / Data parallelism versus task parallelism
  • tasks
    • executing, in parallel on cluster of computers / Executing tasks in parallel on a cluster of computers
  • time complexity
    • about / Algorithm design affects time and space complexity
    • example / Algorithm design affects time and space complexity
    • demonstrating / Algorithm design affects time and space complexity
  • transient storage allocation
    • about / Allocating memory for non-R objects
  • Trelliscope package
    • about / Other Hadoop packages for R

U

  • user-controlled memory
    • about / Allocating memory for non-R objects
  • user-controlled memory, functions
    • type* Calloc(size_t n, type) / Allocating memory for non-R objects
    • type* Realloc(any *p, size_t n, type) / Allocating memory for non-R objects
    • void Free(any *p) / Allocating memory for non-R objects
  • user time
    • about / Measuring execution time with system.time()

V

  • vectorization
    • about / Vectorization

X

  • Xcode Command Line Tools
    • installing / Prerequisites
    • URL, for downloading / Prerequisites